<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Vector-Store on lo0 — Blog Técnico</title><link>https://blog.lo0.es/tags/vector-store/</link><description>Recent content in Vector-Store on lo0 — Blog Técnico</description><generator>Hugo -- gohugo.io</generator><language>es</language><lastBuildDate>Thu, 04 Jun 2026 07:00:00 +0200</lastBuildDate><atom:link href="https://blog.lo0.es/tags/vector-store/index.xml" rel="self" type="application/rss+xml"/><item><title>PostgreSQL + Qdrant en la ingestión RAG: el cartero que sincroniza dos mundos</title><link>https://blog.lo0.es/posts/postgresql-qdrant-ingestion-microservicios/</link><pubDate>Thu, 04 Jun 2026 07:00:00 +0200</pubDate><guid>https://blog.lo0.es/posts/postgresql-qdrant-ingestion-microservicios/</guid><description>&lt;blockquote>
&lt;p>&lt;strong>TL;DR&lt;/strong> — En un sistema RAG de producción, PostgreSQL guarda la verdad oficial de los documentos y Qdrant guarda los vectores para búsqueda. Mantenerlos sincronizados no es trivial: si borras un documento de Postgres y no invalidas sus chunks en Qdrant, el sistema devuelve respuestas de documentos fantasma. Hay dos patrones para evitarlo: el &lt;strong>outbox pattern&lt;/strong> (transacción atómica + worker asíncrono, at-least-once) y &lt;strong>CDC con Debezium&lt;/strong> (lectura directa del WAL de Postgres, baja latencia, mayor complejidad). Este artículo explica cuándo usar cada uno, cómo orquestarlos como microservicios y qué números esperar con &lt;code>bge-m3&lt;/code> en hardware on-premise.&lt;/p>
&lt;/blockquote>
&lt;hr>
&lt;h2 id="la-analogía-del-cartero-y-el-registro-civil">La analogía del cartero y el registro civil&lt;/h2>
&lt;p>Imagina una ciudad con dos oficinas complementarias.&lt;/p>
&lt;p>La primera es el &lt;strong>Registro Civil&lt;/strong>: guarda el censo oficial. Cada vez que nace alguien, muere o cambia de domicilio, el Registro es el primero en saberlo. Es lento, estructurado, transaccional. Si el Registro dice que alguien existe, existe. Si dice que murió, está muerto. &lt;strong>PostgreSQL es el Registro Civil de tus documentos.&lt;/strong>&lt;/p>
&lt;p>La segunda es la &lt;strong>libreta del cartero&lt;/strong>: una copia optimizada para encontrar a cualquier vecino en segundos, organizada por zonas, nombres fonéticos y rutas habituales. El cartero no puede actualizar el Registro, pero sí buscar a velocidades que el Registro jamás alcanzaría. &lt;strong>Qdrant es la libreta del cartero.&lt;/strong>&lt;/p>
&lt;p>El problema es la sincronización. Si el Registro anota un fallecimiento pero nadie avisa al cartero, este seguirá intentando entregar cartas a una dirección que ya no existe. En RAG, eso se traduce en chunks indexados de documentos que ya fueron eliminados, editados o reemplazados — documentos fantasma que contaminan los resultados.&lt;/p>
&lt;p>¿Cómo avisa el Registro al cartero?&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Outbox pattern&lt;/strong>: cada vez que el Registro actualiza su libro mayor, apunta el cambio en una &lt;em>hoja de salida&lt;/em> (outbox). Un empleado mensajero lee esa hoja periódicamente y actualiza la libreta del cartero. Garantizado, asíncrono, tolerante a fallos.&lt;/li>
&lt;li>&lt;strong>CDC con Debezium&lt;/strong>: el cartero tiene un teléfono directo conectado al Registro. Cada vez que el escribano apunta algo nuevo, el teléfono suena y el cartero actualiza su libreta en tiempo casi real.&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="el-problema-del-consistency-gap">El problema del consistency gap&lt;/h2>
&lt;p>En una arquitectura RAG naive, el flujo es:&lt;/p>
&lt;ol>
&lt;li>El usuario sube un documento → se inserta en Postgres con metadatos.&lt;/li>
&lt;li>Un worker lo trocea en chunks, genera embeddings y hace upsert en Qdrant.&lt;/li>
&lt;li>El retrieval usa Qdrant para encontrar chunks relevantes y Postgres para hidratar metadatos.&lt;/li>
&lt;/ol>
&lt;p>Hasta aquí todo bien. El problema aparece en las &lt;strong>actualizaciones y borrados&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>El usuario &lt;strong>edita&lt;/strong> un documento → Postgres actualiza el registro, pero los vectores de los chunks viejos siguen en Qdrant. El retrieval devuelve contexto obsoleto.&lt;/li>
&lt;li>El usuario &lt;strong>borra&lt;/strong> un documento → Postgres elimina la fila, pero los chunks permanecen en Qdrant. El retrieval devuelve chunks de un documento que ya no debería existir.&lt;/li>
&lt;li>El sistema de permisos &lt;strong>revoca el acceso&lt;/strong> de un tenant → Qdrant no tiene forma de saberlo si no hay sincronización explícita.&lt;/li>
&lt;/ul>
&lt;p>Esto no es un problema teórico. En corpus vivos (wikis corporativas, bases de conocimiento actualizadas diariamente), el &lt;em>consistency gap&lt;/em> acumula ruido progresivamente. Un estudio interno en pipelines de producción muestra que sin reconciliación activa, el 3-8% de los chunks indexados corresponde a documentos que ya no existen en la fuente de verdad tras 30 días de operación.&lt;/p>
&lt;p>La solución no es &amp;ldquo;reindexar todo cada noche&amp;rdquo;. Con 10M chunks y un modelo de embedding no trivial, eso cuesta horas de cómputo y provoca ventanas de indisponibilidad. La solución es &lt;strong>propagación de cambios con garantías&lt;/strong>.&lt;/p>
&lt;hr>
&lt;h2 id="outbox-pattern-la-hoja-de-salida">Outbox pattern: la hoja de salida&lt;/h2>
&lt;h3 id="mecanismo">Mecanismo&lt;/h3>
&lt;p>El outbox pattern resuelve el problema de &amp;ldquo;escribir a dos sistemas en la misma operación&amp;rdquo; sin necesidad de transacciones distribuidas (que son caras y frágiles).&lt;/p>
&lt;p>La idea es simple: &lt;strong>PostgreSQL es el coordinador único&lt;/strong>. Cuando el microservicio de ingestión procesa un documento, realiza dos escrituras en la &lt;em>misma transacción local&lt;/em>:&lt;/p>
&lt;ol>
&lt;li>Inserta o actualiza el documento en la tabla &lt;code>documents&lt;/code>.&lt;/li>
&lt;li>Inserta un evento en la tabla &lt;code>outbox_events&lt;/code>.&lt;/li>
&lt;/ol>
&lt;p>Si la transacción falla, ambas escrituras se deshacen. Si tiene éxito, ambas están comprometidas atomicamente. No hay estado intermedio inconsistente.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">-- Tablas relevantes
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="k">CREATE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">TABLE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">documents&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="n">id&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">UUID&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">PRIMARY&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">KEY&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">DEFAULT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">gen_random_uuid&lt;/span>&lt;span class="p">(),&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="n">tenant_id&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nb">TEXT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">NOT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">NULL&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="n">title&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nb">TEXT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">NOT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">NULL&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="n">content&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nb">TEXT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">NOT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">NULL&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="n">checksum&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nb">TEXT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">NOT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">NULL&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="n">updated_at&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">TIMESTAMPTZ&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">DEFAULT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">now&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="p">);&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">CREATE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">TABLE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">outbox_events&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="n">id&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">BIGSERIAL&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">PRIMARY&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">KEY&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="n">aggregate_id&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">UUID&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">NOT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">NULL&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="c1">-- document id
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">event_type&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nb">TEXT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">NOT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">NULL&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="c1">-- &amp;#39;document.created&amp;#39; | &amp;#39;document.updated&amp;#39; | &amp;#39;document.deleted&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">payload&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">JSONB&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">NOT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">NULL&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="n">created_at&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">TIMESTAMPTZ&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">DEFAULT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">now&lt;/span>&lt;span class="p">(),&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="n">processed_at&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">TIMESTAMPTZ&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="c1">-- NULL = pendiente
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="p">);&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="c1">-- Ejemplo de inserción atómica
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="k">BEGIN&lt;/span>&lt;span class="p">;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="k">INSERT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">INTO&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">documents&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">tenant_id&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">title&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">content&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">checksum&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="k">VALUES&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="err">$&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="err">$&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="err">$&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="err">$&lt;/span>&lt;span class="mi">4&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="n">RETURNING&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">id&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">INTO&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">_doc_id&lt;/span>&lt;span class="p">;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="k">INSERT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">INTO&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">outbox_events&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">aggregate_id&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">event_type&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">payload&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="k">VALUES&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">_doc_id&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;document.created&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">jsonb_build_object&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;tenant_id&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="err">$&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;title&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="err">$&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;checksum&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="err">$&lt;/span>&lt;span class="mi">4&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="p">));&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">COMMIT&lt;/span>&lt;span class="p">;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="el-worker-de-outbox">El worker de outbox&lt;/h3>
&lt;p>Un proceso separado (el &lt;em>outbox worker&lt;/em>) hace polling de &lt;code>outbox_events&lt;/code> donde &lt;code>processed_at IS NULL&lt;/code>, procesa cada evento (chunking, embedding, upsert en Qdrant) y marca la fila como procesada:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="cl">&lt;span class="k">UPDATE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">outbox_events&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="k">SET&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">processed_at&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">now&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="k">WHERE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">id&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="err">$&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;strong>Garantía&lt;/strong>: at-least-once. Si el worker falla entre el upsert en Qdrant y el &lt;code>UPDATE&lt;/code>, el evento se reprocesará. Qdrant tolera upserts idempotentes (misma &lt;code>id&lt;/code> de punto = sobreescritura), así que el reprocesado no genera duplicados.&lt;/p>
&lt;p>&lt;strong>Latencia&lt;/strong>: depende del intervalo de polling. Con polling cada 500ms, la latencia p50 es ~250ms; p99, ~500ms. Aceptable para la mayoría de casos RAG donde el usuario no espera ver indexado un documento en menos de un segundo.&lt;/p>
&lt;hr>
&lt;h2 id="cdc-con-debezium-el-teléfono-directo">CDC con Debezium: el teléfono directo&lt;/h2>
&lt;h3 id="mecanismo-1">Mecanismo&lt;/h3>
&lt;p>El &lt;em>Change Data Capture&lt;/em> (CDC) lee el &lt;strong>Write-Ahead Log (WAL)&lt;/strong> de PostgreSQL directamente. Postgres escribe cada cambio en el WAL antes de aplicarlo a las tablas — es el mecanismo que usa para replicación y recuperación. Debezium se suscribe a un &lt;em>slot de replicación lógica&lt;/em> y convierte esos eventos en mensajes estructurados.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">-- Habilitar replicación lógica en postgresql.conf
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">-- wal_level = logical
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="c1">-- Crear slot de replicación para Debezium
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">pg_create_logical_replication_slot&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;debezium_slot&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;pgoutput&amp;#39;&lt;/span>&lt;span class="p">);&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>El flujo completo:&lt;/p>
&lt;pre tabindex="0">&lt;code>Postgres WAL → Debezium connector → Kafka/NATS → Indexer consumer → Qdrant
&lt;/code>&lt;/pre>&lt;p>Debezium emite eventos con la estructura &lt;em>before/after&lt;/em> del registro:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-json" data-lang="json">&lt;span class="line">&lt;span class="cl">&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;op&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;d&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;before&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;550e8400-e29b-41d4-a716-446655440000&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;tenant_id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;acme&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;checksum&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;sha256:abc123&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;after&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="kc">null&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Con &lt;code>&amp;quot;op&amp;quot;: &amp;quot;d&amp;quot;&lt;/code> (delete), el consumer sabe que debe borrar todos los puntos en Qdrant cuyo payload contenga ese &lt;code>document_id&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Consumer: borrado por filtro de payload&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">qdrant_client&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">delete&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">collection_name&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;corpus&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">points_selector&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">FilterSelector&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">filter&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">Filter&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">must&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">FieldCondition&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">key&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;document_id&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">match&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">MatchValue&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">value&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">event&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;before&amp;#34;&lt;/span>&lt;span class="p">][&lt;/span>&lt;span class="s2">&amp;#34;id&amp;#34;&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="ventajas-e-inconvenientes">Ventajas e inconvenientes&lt;/h3>
&lt;p>CDC elimina el polling y reduce la latencia a &lt;strong>decenas de milisegundos&lt;/strong> (el tiempo de propagación del WAL más el procesamiento del consumer). Pero añade complejidad operacional: necesitas gestionar el slot de replicación (los slots no consumidos retienen WAL indefinidamente, lo que puede llenar el disco), el broker de mensajes y el estado del consumer offset.&lt;/p>
&lt;hr>
&lt;h2 id="comparativa-outbox-vs-cdc">Comparativa: outbox vs CDC&lt;/h2>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Criterio&lt;/th>
&lt;th>Outbox pattern&lt;/th>
&lt;th>CDC con Debezium&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;strong>Latencia típica&lt;/strong>&lt;/td>
&lt;td>250ms – 2s&lt;/td>
&lt;td>20ms – 200ms&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Garantía de entrega&lt;/strong>&lt;/td>
&lt;td>At-least-once&lt;/td>
&lt;td>At-least-once&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Complejidad operacional&lt;/strong>&lt;/td>
&lt;td>Baja (solo Postgres)&lt;/td>
&lt;td>Alta (Debezium + broker)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Riesgo de retención WAL&lt;/strong>&lt;/td>
&lt;td>Ninguno&lt;/td>
&lt;td>Alto si el slot se atasca&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Idempotencia requerida&lt;/strong>&lt;/td>
&lt;td>Sí (en indexer)&lt;/td>
&lt;td>Sí (en consumer)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Soporte multi-tabla&lt;/strong>&lt;/td>
&lt;td>Manual&lt;/td>
&lt;td>Automático (cualquier tabla)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Backpressure&lt;/strong>&lt;/td>
&lt;td>Natural (polling)&lt;/td>
&lt;td>Requiere diseño explícito&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Cuándo elegir&lt;/strong>&lt;/td>
&lt;td>Corpus &amp;lt; 100k docs/día, equipo pequeño&lt;/td>
&lt;td>Corpus &amp;gt; 1M docs/día, baja latencia crítica&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>&lt;strong>Regla práctica&lt;/strong>: empieza con outbox. Migra a CDC cuando el volumen de cambios supere los ~50k eventos/hora o cuando la latencia de segundos sea inaceptable para el caso de uso (e.g., indexación de noticias en tiempo real).&lt;/p>
&lt;hr>
&lt;h2 id="arquitectura-de-microservicios">Arquitectura de microservicios&lt;/h2>
&lt;p>El pipeline de ingestión se compone de tres microservicios con responsabilidades bien separadas:&lt;/p>
&lt;svg viewBox="0 0 780 420" xmlns="http://www.w3.org/2000/svg" role="img" aria-labelledby="svg-title svg-desc">
&lt;title id="svg-title">Arquitectura de microservicios de ingestión RAG&lt;/title>
&lt;desc id="svg-desc">Diagrama que muestra el flujo desde Postgres a través de Ingestor, Indexer y Reconciler hasta Qdrant&lt;/desc>
&lt;!-- Fondo -->
&lt;rect width="780" height="420" fill="#0f1117" rx="8"/>
&lt;!-- Postgres -->
&lt;rect x="20" y="160" width="130" height="100" rx="6" fill="#1e2433" stroke="#4a90d9" stroke-width="1.5"/>
&lt;text x="85" y="200" text-anchor="middle" fill="#4a90d9" font-family="monospace" font-size="13" font-weight="bold">PostgreSQL&lt;/text>
&lt;text x="85" y="218" text-anchor="middle" fill="#8899aa" font-family="monospace" font-size="11">documents&lt;/text>
&lt;text x="85" y="234" text-anchor="middle" fill="#8899aa" font-family="monospace" font-size="11">outbox_events&lt;/text>
&lt;!-- Ingestor -->
&lt;rect x="210" y="60" width="140" height="90" rx="6" fill="#1e2433" stroke="#7c4dff" stroke-width="1.5"/>
&lt;text x="280" y="96" text-anchor="middle" fill="#7c4dff" font-family="monospace" font-size="13" font-weight="bold">Ingestor&lt;/text>
&lt;text x="280" y="114" text-anchor="middle" fill="#8899aa" font-family="monospace" font-size="11">chunking&lt;/text>
&lt;text x="280" y="130" text-anchor="middle" fill="#8899aa" font-family="monospace" font-size="11">embedding&lt;/text>
&lt;!-- Indexer -->
&lt;rect x="210" y="190" width="140" height="90" rx="6" fill="#1e2433" stroke="#00c853" stroke-width="1.5"/>
&lt;text x="280" y="226" text-anchor="middle" fill="#00c853" font-family="monospace" font-size="13" font-weight="bold">Indexer&lt;/text>
&lt;text x="280" y="244" text-anchor="middle" fill="#8899aa" font-family="monospace" font-size="11">outbox worker&lt;/text>
&lt;text x="280" y="260" text-anchor="middle" fill="#8899aa" font-family="monospace" font-size="11">Qdrant upsert&lt;/text>
&lt;!-- Reconciler -->
&lt;rect x="210" y="320" width="140" height="70" rx="6" fill="#1e2433" stroke="#ff6d00" stroke-width="1.5"/>
&lt;text x="280" y="354" text-anchor="middle" fill="#ff6d00" font-family="monospace" font-size="13" font-weight="bold">Reconciler&lt;/text>
&lt;text x="280" y="372" text-anchor="middle" fill="#8899aa" font-family="monospace" font-size="11">diff periódico&lt;/text>
&lt;!-- Qdrant -->
&lt;rect x="430" y="160" width="130" height="100" rx="6" fill="#1e2433" stroke="#e91e8c" stroke-width="1.5"/>
&lt;text x="495" y="200" text-anchor="middle" fill="#e91e8c" font-family="monospace" font-size="13" font-weight="bold">Qdrant&lt;/text>
&lt;text x="495" y="218" text-anchor="middle" fill="#8899aa" font-family="monospace" font-size="11">collection&lt;/text>
&lt;text x="495" y="234" text-anchor="middle" fill="#8899aa" font-family="monospace" font-size="11">corpus&lt;/text>
&lt;!-- GPU Embedder -->
&lt;rect x="430" y="50" width="130" height="80" rx="6" fill="#1e2433" stroke="#ffd600" stroke-width="1.5"/>
&lt;text x="495" y="83" text-anchor="middle" fill="#ffd600" font-family="monospace" font-size="12" font-weight="bold">4×H100 SXM&lt;/text>
&lt;text x="495" y="101" text-anchor="middle" fill="#8899aa" font-family="monospace" font-size="11">bge-m3&lt;/text>
&lt;text x="495" y="117" text-anchor="middle" fill="#8899aa" font-family="monospace" font-size="11">~2000 chunks/s&lt;/text>
&lt;!-- Kafka/NATS opcional -->
&lt;rect x="430" y="320" width="130" height="70" rx="6" fill="#1e2433" stroke="#26c6da" stroke-width="1.5"/>
&lt;text x="495" y="352" text-anchor="middle" fill="#26c6da" font-family="monospace" font-size="12" font-weight="bold">Kafka/NATS&lt;/text>
&lt;text x="495" y="370" text-anchor="middle" fill="#8899aa" font-family="monospace" font-size="11">(CDC path)&lt;/text>
&lt;!-- Flecha Postgres → Ingestor -->
&lt;line x1="150" y1="190" x2="210" y2="130" stroke="#7c4dff" stroke-width="1.5" stroke-dasharray="5,3" marker-end="url(#arr)"/>
&lt;!-- Flecha Postgres → Indexer -->
&lt;line x1="150" y1="210" x2="210" y2="230" stroke="#00c853" stroke-width="1.5" marker-end="url(#arr)"/>
&lt;!-- Flecha Postgres → Reconciler -->
&lt;line x1="150" y1="240" x2="210" y2="340" stroke="#ff6d00" stroke-width="1.5" stroke-dasharray="4,3" marker-end="url(#arr)"/>
&lt;!-- Flecha Ingestor → GPU -->
&lt;line x1="350" y1="100" x2="430" y2="90" stroke="#ffd600" stroke-width="1.5" marker-end="url(#arr)"/>
&lt;!-- Flecha GPU → Indexer -->
&lt;line x1="495" y1="130" x2="380" y2="210" stroke="#ffd600" stroke-width="1.2" stroke-dasharray="4,3" marker-end="url(#arr)"/>
&lt;!-- Flecha Indexer → Qdrant -->
&lt;line x1="350" y1="230" x2="430" y2="210" stroke="#00c853" stroke-width="1.5" marker-end="url(#arr)"/>
&lt;!-- Flecha Reconciler → Kafka -->
&lt;line x1="350" y1="355" x2="430" y2="355" stroke="#26c6da" stroke-width="1.5" stroke-dasharray="4,3" marker-end="url(#arr)"/>
&lt;!-- Flecha Kafka → Qdrant -->
&lt;line x1="495" y1="320" x2="495" y2="260" stroke="#26c6da" stroke-width="1.5" stroke-dasharray="4,3" marker-end="url(#arr)"/>
&lt;!-- Reconciler → Qdrant directo -->
&lt;line x1="350" y1="345" x2="430" y2="230" stroke="#ff6d00" stroke-width="1.2" stroke-dasharray="3,4" marker-end="url(#arr)"/>
&lt;!-- Definición de marcadores -->
&lt;defs>
&lt;marker id="arr" markerWidth="8" markerHeight="8" refX="6" refY="3" orient="auto">
&lt;path d="M0,0 L0,6 L8,3 z" fill="#667788"/>
&lt;/marker>
&lt;/defs>
&lt;!-- Leyenda -->
&lt;text x="590" y="80" fill="#ccddee" font-family="monospace" font-size="11" font-weight="bold">Flujo outbox:&lt;/text>
&lt;line x1="590" y1="92" x2="630" y2="92" stroke="#00c853" stroke-width="1.5"/>
&lt;text x="635" y="96" fill="#8899aa" font-family="monospace" font-size="10">síncrono&lt;/text>
&lt;text x="590" y="115" fill="#ccddee" font-family="monospace" font-size="11" font-weight="bold">Flujo CDC:&lt;/text>
&lt;line x1="590" y1="127" x2="630" y2="127" stroke="#26c6da" stroke-width="1.5" stroke-dasharray="4,3"/>
&lt;text x="635" y="131" fill="#8899aa" font-family="monospace" font-size="10">async&lt;/text>
&lt;text x="590" y="150" fill="#ccddee" font-family="monospace" font-size="11" font-weight="bold">Reconciler:&lt;/text>
&lt;line x1="590" y1="162" x2="630" y2="162" stroke="#ff6d00" stroke-width="1.5" stroke-dasharray="3,4"/>
&lt;text x="635" y="166" fill="#8899aa" font-family="monospace" font-size="10">periódico&lt;/text>
&lt;/svg>
&lt;h3 id="microservicio-1-ingestor">Microservicio 1: Ingestor&lt;/h3>
&lt;p>Responsabilidades: recibir documentos, trocearlos en chunks y solicitar embeddings. No escribe en Qdrant directamente.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># ingestor/main.py (simplificado)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">langchain_text_splitters&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">RecursiveCharacterTextSplitter&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">splitter&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">RecursiveCharacterTextSplitter&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">chunk_size&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">512&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="c1"># tokens, no caracteres&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">chunk_overlap&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">64&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">length_function&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">token_count&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">ingest_document&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">doc&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">Document&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">db&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">Session&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="kc">None&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">chunks&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">splitter&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">split_text&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">doc&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">content&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">with&lt;/span> &lt;span class="n">db&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">begin&lt;/span>&lt;span class="p">():&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">db&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">execute&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;UPDATE documents SET checksum=$1 WHERE id=$2&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">[&lt;/span>&lt;span class="n">doc&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">checksum&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">doc&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">id&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">db&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">execute&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;&amp;#34;&amp;#34;INSERT INTO outbox_events (aggregate_id, event_type, payload)
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> VALUES ($1, &amp;#39;document.updated&amp;#39;, $2)&amp;#34;&amp;#34;&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">[&lt;/span>&lt;span class="n">doc&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">id&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="p">{&lt;/span>&lt;span class="s2">&amp;#34;chunks&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">chunks&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;tenant_id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">doc&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tenant_id&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;model&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;bge-m3&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;model_version&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;1.0.0&amp;#34;&lt;/span>&lt;span class="p">}]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="microservicio-2-indexer">Microservicio 2: Indexer&lt;/h3>
&lt;p>Lee la outbox, genera embeddings llamando al servidor de inferencia (vLLM o TEI) y hace upsert en Qdrant.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># indexer/worker.py&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">asyncio&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">qdrant_client&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">QdrantClient&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">qdrant_client.models&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">PointStruct&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">VectorParams&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">Distance&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">qdrant&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">QdrantClient&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">host&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;qdrant-service&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">port&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">6333&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">async&lt;/span> &lt;span class="k">def&lt;/span> &lt;span class="nf">process_event&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">event&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nb">dict&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="kc">None&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">payload&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">event&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;payload&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">chunks&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">payload&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;chunks&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">doc_id&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nb">str&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">event&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;aggregate_id&amp;#34;&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Embedding batch en TEI (Text Embeddings Inference)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">embeddings&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">await&lt;/span> &lt;span class="n">embed_batch&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">chunks&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;bge-m3&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">points&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PointStruct&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">doc_id&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">_&lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">vector&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">emb&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">payload&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;document_id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">doc_id&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;tenant_id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">payload&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;tenant_id&amp;#34;&lt;/span>&lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;chunk_index&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">i&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;text&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">chunks&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;model_version&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">payload&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;model_version&amp;#34;&lt;/span>&lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">i&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">emb&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="nb">enumerate&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">embeddings&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Si es actualización, borrar chunks anteriores primero&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">event&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;event_type&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;document.updated&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;document.deleted&amp;#34;&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">qdrant&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">delete&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">collection_name&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;corpus&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">points_selector&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">filter_by_doc_id&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">doc_id&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">event&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;event_type&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="s2">&amp;#34;document.deleted&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">qdrant&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">upsert&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">collection_name&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;corpus&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">points&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">points&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="microservicio-3-reconciler">Microservicio 3: Reconciler&lt;/h3>
&lt;p>El reconciler es la red de seguridad. Periódicamente (por ejemplo, cada hora) compara el conjunto de &lt;code>document_id&lt;/code> en Postgres con el conjunto de &lt;code>document_id&lt;/code> en Qdrant. Los IDs presentes en Qdrant pero ausentes en Postgres son &lt;em>fantasmas&lt;/em>: se borran.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># reconciler/diff.py&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">async&lt;/span> &lt;span class="k">def&lt;/span> &lt;span class="nf">reconcile&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">tenant_id&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nb">str&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="nb">int&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pg_ids&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nb">set&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">await&lt;/span> &lt;span class="n">fetch_all_doc_ids&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">tenant_id&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">qdrant_ids&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nb">set&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">await&lt;/span> &lt;span class="n">scroll_all_doc_ids&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">tenant_id&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="c1"># scroll paginado&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">orphans&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">qdrant_ids&lt;/span> &lt;span class="o">-&lt;/span> &lt;span class="n">pg_ids&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">orphans&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">logger&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">warning&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;Orphan chunks for &lt;/span>&lt;span class="si">%d&lt;/span>&lt;span class="s2"> documents&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">orphans&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">doc_id&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">orphans&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">qdrant&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">delete&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;corpus&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">filter_by_doc_id&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">doc_id&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">orphans&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;hr>
&lt;h2 id="matemáticas-de-throughput-e-ingestión">Matemáticas de throughput e ingestión&lt;/h2>
&lt;h3 id="throughput-de-embedding">Throughput de embedding&lt;/h3>
&lt;p>El modelo &lt;code>bge-m3&lt;/code> (1024 dimensiones, soporte denso + sparse + colbert) en un nodo con &lt;strong>4×H100 SXM (320 GB NVLink)&lt;/strong> ejecutado vía vLLM o HuggingFace TEI alcanza aproximadamente &lt;strong>2.000 chunks/segundo&lt;/strong> con batch size = 256 y secuencias de 512 tokens.&lt;/p>
&lt;p>$$\text{throughput} = 4 \times 500 \text{ chunks/s/GPU} = 2{,}000 \text{ chunks/s}$$&lt;/p>
&lt;blockquote>
&lt;p>La cifra de 500 chunks/s por GPU proviene de benchmarks públicos de TEI con bge-m3 en H100 SXM5, batch=256, seq_len=512 &lt;sup id="fnref:1">&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref">1&lt;/a>&lt;/sup>.&lt;/p>
&lt;/blockquote>
&lt;h3 id="tiempo-de-re-ingestión-total">Tiempo de re-ingestión total&lt;/h3>
&lt;p>Para un corpus de &lt;strong>10M chunks&lt;/strong>:&lt;/p>
&lt;p>$$t = \frac{10{,}000{,}000 \text{ chunks}}{2{,}000 \text{ chunks/s}} = 5{,}000 \text{ s} \approx 83 \text{ minutos}$$&lt;/p>
&lt;p>Esto es el tiempo puro de embedding. Añadiendo latencia de escritura en Qdrant (~0,5ms por upsert batch de 100 puntos):&lt;/p>
&lt;p>$$t_{\text{qdrant}} = \frac{10{,}000{,}000}{100} \times 0.5\text{ ms} = 50{,}000 \text{ ms} = 50 \text{ s}$$&lt;/p>
&lt;p>Total estimado para una re-ingestión completa: &lt;strong>~85-90 minutos&lt;/strong> en un nodo 4×H100.&lt;/p>
&lt;h3 id="coste-de-almacenamiento-en-qdrant">Coste de almacenamiento en Qdrant&lt;/h3>
&lt;p>Cada vector de &lt;code>bge-m3&lt;/code> tiene &lt;strong>1024 dimensiones&lt;/strong> en &lt;code>float32&lt;/code> (4 bytes):&lt;/p>
&lt;p>$$\text{tamaño por vector} = 1{,}024 \times 4 \text{ B} = 4{,}096 \text{ B} = 4 \text{ KB}$$&lt;/p>
&lt;p>Para 10M chunks (solo vectores densos):&lt;/p>
&lt;p>$$\text{total vectores} = 10^7 \times 4{,}096 \text{ B} = 40.96 \text{ GB}$$&lt;/p>
&lt;p>Añadiendo payload JSON (estimado ~500 bytes/chunk):&lt;/p>
&lt;p>$$\text{payload} = 10^7 \times 500 \text{ B} = 5 \text{ GB}$$&lt;/p>
&lt;p>Índice HNSW (aproximadamente 1.2× el tamaño del vector para $m=16$):&lt;/p>
&lt;p>$$\text{HNSW} \approx 40.96 \text{ GB} \times 1.2 = 49.15 \text{ GB}$$&lt;/p>
&lt;p>&lt;strong>Total estimado en disco: ~95 GB&lt;/strong> para 10M chunks con bge-m3 denso.&lt;/p>
&lt;p>Con &lt;code>scalar&lt;/code> quantization (int8), el tamaño del vector se reduce 4×:&lt;/p>
&lt;p>$$\text{con quantización int8} \approx \frac{40.96}{4} + 5 + \frac{49.15}{4} \approx 27.5 \text{ GB}$$&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Configuración&lt;/th>
&lt;th>Vectores&lt;/th>
&lt;th>HNSW&lt;/th>
&lt;th>Payload&lt;/th>
&lt;th>Total&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>float32, sin quantización&lt;/td>
&lt;td>40.96 GB&lt;/td>
&lt;td>49.15 GB&lt;/td>
&lt;td>5 GB&lt;/td>
&lt;td>~95 GB&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>int8 scalar quantization&lt;/td>
&lt;td>10.24 GB&lt;/td>
&lt;td>12.29 GB&lt;/td>
&lt;td>5 GB&lt;/td>
&lt;td>~28 GB&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>binary quantization&lt;/td>
&lt;td>1.28 GB&lt;/td>
&lt;td>1.54 GB&lt;/td>
&lt;td>5 GB&lt;/td>
&lt;td>~8 GB&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>La binary quantization pierde precisión de recall (~2-5% en NDCG@10), pero permite alojar corpus mucho mayores en RAM. Para producción con recall crítico, int8 es el punto de equilibrio habitual.&lt;/p>
&lt;hr>
&lt;h2 id="hardware-on-premise-recomendado">Hardware on-premise recomendado&lt;/h2>
&lt;p>Para un pipeline de ingestión continua en producción:&lt;/p>
&lt;p>&lt;strong>Nodo de embedding&lt;/strong>: 4×H100 SXM (320 GB, NVLink), 2× CPU 64-core (EPYC 9654), 1 TB RAM DDR5, 100 GbE. Ejecuta vLLM o TEI sirviendo &lt;code>bge-m3&lt;/code>. Throughput sostenido: ~2.000 chunks/s con pipeline batch asíncrono.&lt;/p>
&lt;p>&lt;strong>Nodo Qdrant&lt;/strong>: CPU 32-core, 256 GB RAM (para mantener el índice HNSW en memoria con 10M chunks sin quantización), NVMe 2 TB (escritura de snapshots y WAL de Qdrant). Qdrant recomienda que el índice HNSW quepa en RAM para latencia p99 &amp;lt; 5ms.&lt;/p>
&lt;p>&lt;strong>Nodo PostgreSQL&lt;/strong>: CPU 16-core, 128 GB RAM, NVMe 4 TB para WAL (especialmente relevante si usas CDC con slot de replicación lógica; el slot retiene WAL hasta que Debezium lo consume).&lt;/p>
&lt;p>&lt;strong>Broker (si CDC)&lt;/strong>: Kafka 3-broker con 500 GB NVMe por nodo, o NATS JetStream con 3 nodos para cargas más modestas.&lt;/p>
&lt;hr>
&lt;h2 id="manifests-de-kubernetes">Manifests de Kubernetes&lt;/h2>
&lt;h3 id="deployment-del-indexer">Deployment del Indexer&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">&lt;span class="nt">apiVersion&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">apps/v1&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">kind&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">Deployment&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">metadata&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">rag-indexer&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">namespace&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">rag-pipeline&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">spec&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">replicas&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="m">3&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">selector&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">matchLabels&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">app&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">rag-indexer&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">template&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">metadata&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">labels&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">app&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">rag-indexer&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">spec&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">containers&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">indexer&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">image&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">registry.example.com/rag-indexer:1.0.0&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">env&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">POSTGRES_DSN&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">valueFrom&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">secretKeyRef&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">pg-credentials&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">key&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">dsn&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">QDRANT_HOST&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">value&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;qdrant-service.qdrant.svc.cluster.local&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">EMBEDDING_ENDPOINT&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">value&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;http://tei-service.embeddings.svc.cluster.local:8080&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">POLL_INTERVAL_MS&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">value&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;500&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">resources&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">requests&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">memory&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;512Mi&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">cpu&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;500m&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">limits&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">memory&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;2Gi&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">cpu&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;2&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="cronjob-del-reconciler">CronJob del Reconciler&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">&lt;span class="nt">apiVersion&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">batch/v1&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">kind&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">CronJob&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">metadata&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">rag-reconciler&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">namespace&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">rag-pipeline&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">spec&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">schedule&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;0 * * * *&amp;#34;&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="c"># cada hora&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">concurrencyPolicy&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">Forbid&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">jobTemplate&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">spec&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">template&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">spec&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">restartPolicy&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">OnFailure&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">containers&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">reconciler&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">image&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">registry.example.com/rag-reconciler:1.0.0&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">env&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">POSTGRES_DSN&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">valueFrom&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">secretKeyRef&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">pg-credentials&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">key&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">dsn&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">QDRANT_HOST&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">value&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;qdrant-service.qdrant.svc.cluster.local&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">SCROLL_PAGE_SIZE&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">value&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;1000&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">resources&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">requests&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">memory&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;256Mi&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">cpu&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;250m&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">limits&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">memory&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;1Gi&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">cpu&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;1&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;hr>
&lt;h2 id="gotchas-de-producción">Gotchas de producción&lt;/h2>
&lt;h3 id="reindexing-al-cambiar-de-modelo-de-embedding">Reindexing al cambiar de modelo de embedding&lt;/h3>
&lt;p>Este es el problema más doloroso. Si pasas de &lt;code>bge-m3&lt;/code> a &lt;code>nomic-embed-text-v2&lt;/code>, los vectores son &lt;strong>incompatibles&lt;/strong>: están en espacios de embedding distintos y las distancias coseno entre ellos no tienen significado.&lt;/p>
&lt;p>La solución es &lt;strong>dual-index aliasing&lt;/strong>:&lt;/p>
&lt;ol>
&lt;li>Crea una colección nueva en Qdrant: &lt;code>corpus_v2&lt;/code>.&lt;/li>
&lt;li>Re-embeds el corpus completo con el modelo nuevo y carga &lt;code>corpus_v2&lt;/code>.&lt;/li>
&lt;li>Cuando la colección nueva está completa y validada (test de recall), cambia el alias de &lt;code>corpus_prod&lt;/code> de &lt;code>corpus_v1&lt;/code> a &lt;code>corpus_v2&lt;/code>.&lt;/li>
&lt;li>Borra &lt;code>corpus_v1&lt;/code> cuando el tráfico haya migrado.&lt;/li>
&lt;/ol>
&lt;p>Durante la migración, los dos índices coexisten. El retriever usa el alias, no el nombre directo de la colección.&lt;/p>
&lt;h3 id="versioning-del-índice">Versioning del índice&lt;/h3>
&lt;p>Guarda &lt;code>model_version&lt;/code> en el payload de cada punto en Qdrant. Esto permite:&lt;/p>
&lt;ul>
&lt;li>Filtrar por versión durante el retrieval (útil en A/B testing de modelos).&lt;/li>
&lt;li>El reconciler puede detectar puntos con versión antigua y reprocesarlos selectivamente.&lt;/li>
&lt;li>Auditoría: saber con qué modelo se generó cada embedding.&lt;/li>
&lt;/ul>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Filtro por versión de modelo en retrieval&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">results&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">qdrant&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">search&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">collection_name&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;corpus&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">query_vector&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">query_embedding&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">query_filter&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">Filter&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">must&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">FieldCondition&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">key&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;model_version&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="k">match&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">MatchValue&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">value&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;1.0.0&amp;#34;&lt;/span>&lt;span class="p">)),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">FieldCondition&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">key&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;tenant_id&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="k">match&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">MatchValue&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">value&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">tenant_id&lt;/span>&lt;span class="p">)),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">limit&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">10&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="namespace-por-tenant-multi-tenancy">Namespace por tenant (multi-tenancy)&lt;/h3>
&lt;p>Hay dos estrategias en Qdrant:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Estrategia&lt;/th>
&lt;th>Pros&lt;/th>
&lt;th>Contras&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;strong>Colección por tenant&lt;/strong>&lt;/td>
&lt;td>Aislamiento total, sin filtro extra&lt;/td>
&lt;td>N colecciones = N índices HNSW en RAM&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Payload filter por tenant&lt;/strong>&lt;/td>
&lt;td>Una sola colección, menos RAM&lt;/td>
&lt;td>Filtro añade ~10-15% de latencia en búsqueda&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Para menos de 100 tenants con corpus grandes (&amp;gt; 1M chunks/tenant), usa colección por tenant. Para cientos o miles de tenants con corpus pequeños, usa payload filter con &lt;code>tenant_id&lt;/code> indexado:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">qdrant&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">create_payload_index&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">collection_name&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;corpus&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">field_name&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;tenant_id&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">field_schema&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">PayloadSchemaType&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">KEYWORD&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;hr>
&lt;h2 id="lo-que-no-hemos-cubierto">Lo que no hemos cubierto&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Streaming corpus updates con CDC en near-real-time&lt;/strong>: invalidación selectiva de chunks cuando solo una sección de un documento cambia (chunking incremental basado en diff de contenido, no re-chunking completo).&lt;/li>
&lt;li>&lt;strong>Multi-tenant corpus isolation con ACLs por chunk&lt;/strong>: ir más allá del filtro por &lt;code>tenant_id&lt;/code> para permisos a nivel de grupo, rol o incluso documento individual, aplicados en tiempo de retrieval.&lt;/li>
&lt;li>&lt;strong>Federated corpus&lt;/strong>: corpora distribuidos en silos cross-border donde las regulaciones (GDPR, CCPA) impiden centralizar los embeddings; patrones de federated search sin mover datos.&lt;/li>
&lt;li>&lt;strong>Reindexing incremental con zero-downtime usando dual-index aliasing&lt;/strong>: el protocolo completo de migración de modelo con rollback, test de regresión de recall y traffic splitting progresivo.&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="ver-también">Ver también&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://blog.lo0.es/posts/rag-corpus-curation-fundamentos/">El corpus curado que esta arquitectura debe indexar&lt;/a> — estrategias de curación y filtrado antes de la ingestión.&lt;/li>
&lt;li>&lt;a href="https://blog.lo0.es/posts/rag-reranker-hybrid-retrieval-fundamentos/">El retrieval que consume este vector store&lt;/a> — cómo el reranker y el retrieval híbrido usan lo que aquí construimos.&lt;/li>
&lt;li>&lt;a href="https://blog.lo0.es/posts/embeddings-modelos-2026-dense-sparse-multivector/">El embedder que genera los vectores&lt;/a> — comparativa de bge-m3, nomic-embed-text-v2 y modelos multivector.&lt;/li>
&lt;li>&lt;a href="https://blog.lo0.es/posts/pipeline-llmops-seis-etapas/">La etapa Data del mapa maestro LLMOps&lt;/a> — contexto de esta pipeline dentro del ciclo completo.&lt;/li>
&lt;li>&lt;a href="https://blog.lo0.es/posts/data-versioning-dvc-lakefs/">Versioning del corpus raw antes de la ingestión&lt;/a> — cómo DVC y LakeFS gestionan el linaje antes de llegar a PostgreSQL.&lt;/li>
&lt;li>&lt;a href="https://blog.lo0.es/posts/debezium-cdc-fundamentos/">Debezium y CDC: el notario que escucha los cambios antes de que nadie los pida&lt;/a> — el deep dive en CDC que este artículo introduce: WAL de Postgres, slots de replicación, pgoutput y la comparativa completa con el outbox pattern.&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="referencias">Referencias&lt;/h2>
&lt;div class="footnotes" role="doc-endnotes">
&lt;hr>
&lt;ol>
&lt;li id="fn:1">
&lt;p>HuggingFace Text Embeddings Inference — benchmarks oficiales con modelos de la familia bge en hardware A100/H100. &lt;a href="https://github.com/huggingface/text-embeddings-inference">https://github.com/huggingface/text-embeddings-inference&lt;/a>&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/div></description></item></channel></rss>