<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Multitenancy on lo0 — Blog Técnico</title><link>https://blog.lo0.es/tags/multitenancy/</link><description>Recent content in Multitenancy on lo0 — Blog Técnico</description><generator>Hugo -- gohugo.io</generator><language>es</language><lastBuildDate>Mon, 15 Jun 2026 04:00:00 +0200</lastBuildDate><atom:link href="https://blog.lo0.es/tags/multitenancy/index.xml" rel="self" type="application/rss+xml"/><item><title>Chargeback y showback de GPU en multi-tenancy: cómo repartir el coste del cluster entre equipos</title><link>https://blog.lo0.es/posts/chargeback-showback-multitenancy-gpu/</link><pubDate>Mon, 15 Jun 2026 04:00:00 +0200</pubDate><guid>https://blog.lo0.es/posts/chargeback-showback-multitenancy-gpu/</guid><description>&lt;blockquote>
&lt;p>Notación: importes en &lt;strong>euros (N €)&lt;/strong>, decimales con coma. No se usa el símbolo de dólar
(en este sitio es delimitador de fórmula).&lt;/p>
&lt;/blockquote>
&lt;h2 id="tldr">TL;DR&lt;/h2>
&lt;p>Un cluster de 4×H100 SXM a ~1,40 €/GPU-hora (capex amortizado + energía) compartido entre tres
equipos con distinta ocupación produce un informe de chargeback mensual de 3 filas y una fila
adicional de idle que nadie reclamó. Sin herramientas, ese coste ocioso desaparece diluido en el
total. Con OpenCost + LiteLLM + Kueue, la atribución opera en tres planos ortogonales: el
&lt;strong>hierro&lt;/strong> (OpenCost, €/GPU-hora por namespace/label), el &lt;strong>consumo de tokens&lt;/strong> (LiteLLM,
€/token por clave/equipo/modelo) y la &lt;strong>cuota de scheduler&lt;/strong> (Kueue, GPUs reservadas por
ClusterQueue). El cruce de los tres produce el número que va a finanzas: equipo B consumió
X millones de tokens a Y €/1M tok, la GPU le costó Z €, y todavía tiene 2 GPUs prestadas del
cohort que le costarán W € si las retiene el mes que viene.&lt;/p>
&lt;hr>
&lt;h2 id="showback-vs-chargeback-definición-finops-foundation">Showback vs chargeback: definición FinOps Foundation&lt;/h2>
&lt;p>La distinción no es de tecnología sino de &lt;strong>formalidad contable&lt;/strong>
(&lt;a href="https://www.finops.org/framework/capabilities/invoicing-chargeback/">FinOps Foundation — Invoicing &amp;amp; Chargeback&lt;/a>,
&lt;a href="https://www.finops.org/framework/previous-capabilities/analysis-showback/">FinOps Foundation — Data Analysis and Showback&lt;/a>):&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Concepto&lt;/th>
&lt;th>Definición&lt;/th>
&lt;th>Mueve dinero&lt;/th>
&lt;th>Requiere&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;strong>Showback&lt;/strong>&lt;/td>
&lt;td>visibilidad del consumo y su coste por equipo; el informe llega, el presupuesto no cambia&lt;/td>
&lt;td>No&lt;/td>
&lt;td>métricas + atribución&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Chargeback&lt;/strong>&lt;/td>
&lt;td>el coste se transfiere al P&amp;amp;L del equipo o producto como gasto real&lt;/td>
&lt;td>Sí&lt;/td>
&lt;td>showback + política contable + sistema financiero&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Dos puntos del framework que conviene fijar:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Showback es requisito de cualquier práctica FinOps; chargeback es opcional&lt;/strong> y depende de
que la política contable de la organización soporte la transferencia entre centros de coste.&lt;/li>
&lt;li>&lt;strong>Ninguno es &amp;ldquo;más maduro&amp;rdquo; que el otro.&lt;/strong> La narrativa de que chargeback es la versión
&amp;ldquo;adulta&amp;rdquo; es falsa según el propio framework. La secuencia natural es: showback → confianza en
los datos → chargeback si la política lo permite.&lt;/li>
&lt;/ol>
&lt;h3 id="cuándo-usar-cada-uno">Cuándo usar cada uno&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Situación&lt;/th>
&lt;th>Modalidad recomendada&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Primer ciclo de atribución; los equipos aún no confían en los datos&lt;/td>
&lt;td>Showback&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Equipos con presupuesto propio en sistema financiero&lt;/td>
&lt;td>Chargeback&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Cluster compartido sin separación de P&amp;amp;L por equipo&lt;/td>
&lt;td>Showback con coste de idle visible&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Equipos con SLA de disponibilidad de GPU comprometido&lt;/td>
&lt;td>Chargeback (reserva real)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Workloads de investigación/experimentación sin presupuesto formal&lt;/td>
&lt;td>Showback + alerta de umbral&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;hr>
&lt;h2 id="atribución-del-coste-gpu-con-opencost">Atribución del coste GPU con OpenCost&lt;/h2>
&lt;h3 id="allocation-api-parámetros-de-atribución">Allocation API: parámetros de atribución&lt;/h3>
&lt;p>La API &lt;code>/allocation&lt;/code> de OpenCost es la pieza que transforma las métricas de Prometheus en un
informe de coste por dimensión
(&lt;a href="https://opencost.io/docs/integrations/api/">OpenCost — API&lt;/a>):&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Parámetro&lt;/th>
&lt;th>Valores&lt;/th>
&lt;th>Uso en multi-tenancy&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>window&lt;/code>&lt;/td>
&lt;td>&lt;code>today&lt;/code>, &lt;code>7d&lt;/code>, &lt;code>lastmonth&lt;/code>, rango RFC3339&lt;/td>
&lt;td>ventana del informe mensual&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>aggregate&lt;/code>&lt;/td>
&lt;td>&lt;code>namespace&lt;/code>, &lt;code>label:LABEL&lt;/code>, &lt;code>annotation:NAME&lt;/code>, &lt;code>pod&lt;/code>, &lt;code>controller&lt;/code>&lt;/td>
&lt;td>dimensión de atribución&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>includeIdle&lt;/code>&lt;/td>
&lt;td>&lt;code>true&lt;/code> / &lt;code>false&lt;/code>&lt;/td>
&lt;td>añade fila &lt;code>__idle__&lt;/code> al informe&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>shareIdle&lt;/code>&lt;/td>
&lt;td>&lt;code>true&lt;/code> / &lt;code>false&lt;/code>&lt;/td>
&lt;td>distribuye el idle entre las asignaciones no ociosas (proporcional a coste no idle)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>idleByNode&lt;/code>&lt;/td>
&lt;td>&lt;code>true&lt;/code> / &lt;code>false&lt;/code>&lt;/td>
&lt;td>calcula idle por nodo en lugar de por cluster&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>filter&lt;/code>&lt;/td>
&lt;td>&lt;code>namespace:&amp;quot;llm-prod&amp;quot;&lt;/code>, &lt;code>label:equipo:&amp;quot;datos&amp;quot;&lt;/code>&lt;/td>
&lt;td>limitar a un equipo concreto&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Consulta de informe mensual por equipo, con idle visible como fila separada:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">curl -G http://localhost:9003/allocation &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> -d &lt;span class="nv">window&lt;/span>&lt;span class="o">=&lt;/span>lastmonth &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> -d &lt;span class="nv">aggregate&lt;/span>&lt;span class="o">=&lt;/span>label:equipo &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> -d &lt;span class="nv">includeIdle&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="nb">true&lt;/span> &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> -d &lt;span class="nv">shareIdle&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="nb">false&lt;/span> &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> -d &lt;span class="nv">resolution&lt;/span>&lt;span class="o">=&lt;/span>10m
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Con &lt;code>shareIdle=false&lt;/code> (el default), OpenCost devuelve el idle como &lt;code>__idle__&lt;/code> separado — lo que
hace visible cuánto se pagó por capacidad sin usar. Con &lt;code>shareIdle=true&lt;/code>, ese idle se distribuye
entre los tenants en proporción a su coste no-idle: cada uno absorbe su parte del desperdicio,
lo que es correcto para un chargeback que penalice al responsable del idle.&lt;/p>
&lt;h3 id="etiquetado-de-pods-de-inferencia">Etiquetado de pods de inferencia&lt;/h3>
&lt;p>Para que &lt;code>aggregate=label:equipo&lt;/code> funcione, los pods de inferencia necesitan el label correcto.
Dos métodos:&lt;/p>
&lt;p>&lt;strong>Label en el pod spec&lt;/strong> (el más directo):&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">&lt;span class="c"># Deployment de vLLM para equipo datos&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">metadata&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">labels&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">equipo&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">datos&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">producto&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">rag-prod&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">modelo&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">llama-70b&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">entorno&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">prod&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;strong>Annotation&lt;/strong> (cuando el label ya está ocupado por otra convención):&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># agrega como annotation&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nv">aggregate&lt;/span>&lt;span class="o">=&lt;/span>annotation:finops.io/equipo
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>OpenCost soporta &lt;code>aggregate=label:KEY&lt;/code> y &lt;code>aggregate=annotation:KEY&lt;/code> con la misma sintaxis; la
elección entre uno u otro depende de la convención de etiquetado del cluster.&lt;/p>
&lt;h3 id="costes-compartidos-e-idle-los-tres-modos">Costes compartidos e idle: los tres modos&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Modo API&lt;/th>
&lt;th>Comportamiento&lt;/th>
&lt;th>Cuándo&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>includeIdle=false&lt;/code> (default)&lt;/td>
&lt;td>Ignora el idle; el coste total parece menor de lo que es&lt;/td>
&lt;td>Nunca en producción&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>includeIdle=true, shareIdle=false&lt;/code>&lt;/td>
&lt;td>Idle en fila &lt;code>__idle__&lt;/code> separada&lt;/td>
&lt;td>Showback: visibilidad del desperdicio&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>includeIdle=true, shareIdle=true&lt;/code>&lt;/td>
&lt;td>Idle distribuido entre tenants proporcionalmente&lt;/td>
&lt;td>Chargeback: el equipo paga su parte del idle&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>shareIdle=true, idleByNode=true&lt;/code>&lt;/td>
&lt;td>Idle distribuido por nodo (más granular si hay nodos dedicados)&lt;/td>
&lt;td>Chargeback con nodos heterogéneos&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Para infraestructura con nodos dedicados por equipo, &lt;code>idleByNode=true&lt;/code> es más justo: el idle de
un nodo dedicado al equipo A no se transfiere al equipo B.&lt;/p>
&lt;hr>
&lt;h2 id="litellm-como-punto-de-atribución-por-consumo-de-tokens">LiteLLM como punto de atribución por consumo de tokens&lt;/h2>
&lt;h3 id="virtual-keys-y-equipos">Virtual keys y equipos&lt;/h3>
&lt;p>LiteLLM materializa el chargeback por tokens con cuatro mecanismos
(&lt;a href="https://docs.litellm.ai/docs/proxy/virtual_keys">LiteLLM — Virtual Keys&lt;/a>,
&lt;a href="https://docs.litellm.ai/docs/proxy/team_budgets">LiteLLM — Setting Team Budgets&lt;/a>,
&lt;a href="https://docs.litellm.ai/docs/proxy/cost_tracking">LiteLLM — Spend Tracking&lt;/a>):&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Mecanismo&lt;/th>
&lt;th>Qué hace&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;strong>Virtual key&lt;/strong> con &lt;code>max_budget&lt;/code>&lt;/td>
&lt;td>presupuesto mensual por clave; la key se bloquea al agotarlo&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>&lt;code>budget_duration&lt;/code>&lt;/strong>&lt;/td>
&lt;td>ventana de reset: &lt;code>30d&lt;/code>, &lt;code>7d&lt;/code>, &lt;code>24h&lt;/code>; el proxy corre un cron diario que resetea según la duración&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>&lt;code>team_id&lt;/code>&lt;/strong>&lt;/td>
&lt;td>agrupa claves; el gasto se acumula en &lt;code>LiteLLM_TeamTable&lt;/code> por equipo&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>&lt;code>tags&lt;/code>&lt;/strong>&lt;/td>
&lt;td>etiquetan cada request; permiten presupuestos por tag (centro de coste)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Spend logs&lt;/strong>&lt;/td>
&lt;td>una fila por petición con &lt;code>team_id&lt;/code>, &lt;code>model&lt;/code>, &lt;code>prompt_tokens&lt;/code>, &lt;code>completion_tokens&lt;/code>, &lt;code>response_cost&lt;/code>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="configuración-de-equipos-con-presupuesto-mensual">Configuración de equipos con presupuesto mensual&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">&lt;span class="c"># litellm-config.yaml — presupuestos por equipo con coste on-prem declarado&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">model_list&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">model_name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">llama-3-70b-onprem&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">litellm_params&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">model&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">openai/llama-3-70b&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">api_base&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">http://vllm-svc:8000/v1&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">input_cost_per_token&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="m">0.00000140&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="c"># 1,40 €/1M tok (€/GPU-hora ÷ throughput)&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">output_cost_per_token&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="m">0.00000140&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">model_name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">llama-3-8b-onprem&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">litellm_params&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">model&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">openai/llama-3-8b&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">api_base&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">http://vllm-8b-svc:8000/v1&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">input_cost_per_token&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="m">0.00000035&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="c"># 0,35 €/1M tok (1 GPU, mayor throughput)&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">output_cost_per_token&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="m">0.00000035&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">litellm_settings&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">success_callback&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;langfuse&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="c"># o cualquier logger externo&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="c"># Equipos (vía API /team/new o config)&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="c"># POST /team/new&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="c"># {&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="c"># &amp;#34;team_alias&amp;#34;: &amp;#34;equipo-datos&amp;#34;,&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="c"># &amp;#34;max_budget&amp;#34;: 800, # 800 € al mes&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="c"># &amp;#34;budget_duration&amp;#34;: &amp;#34;30d&amp;#34;,&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="c"># &amp;#34;tpm_limit&amp;#34;: 500000, # tokens/min máx&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="c"># &amp;#34;rpm_limit&amp;#34;: 1000 # requests/min máx&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="c"># }&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>La clave &lt;code>input_cost_per_token&lt;/code> sale de la identidad del &lt;a href="https://blog.lo0.es/posts/coste-por-token-y-por-request/">artículo A4&lt;/a>:
el €/GPU-hora de OpenCost dividido por el throughput del benchmark. Con ese valor, el
&lt;code>response_cost&lt;/code> de cada petición refleja el coste real del hierro on-prem para ese modelo.&lt;/p>
&lt;h3 id="endpoints-de-consulta-de-gasto-por-equipo">Endpoints de consulta de gasto por equipo&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Gasto acumulado del equipo en el periodo&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">GET /team/info?team_id&lt;span class="o">=&lt;/span>equipo-datos
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Gasto por modelo del equipo (tabla spend_logs agregada)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">GET /global/spend/logs?team_id&lt;span class="o">=&lt;/span>equipo-datos&lt;span class="p">&amp;amp;&lt;/span>&lt;span class="nv">start_date&lt;/span>&lt;span class="o">=&lt;/span>2026-06-01&lt;span class="p">&amp;amp;&lt;/span>&lt;span class="nv">end_date&lt;/span>&lt;span class="o">=&lt;/span>2026-06-30
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Gasto total por todos los equipos&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">GET /global/spend/teams
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>La respuesta incluye &lt;code>spend&lt;/code> (coste acumulado en la divisa configurada), &lt;code>total_tokens&lt;/code>,
&lt;code>prompt_tokens&lt;/code>, &lt;code>completion_tokens&lt;/code> y desglose por modelo — los campos directos para el
informe de chargeback mensual.&lt;/p>
&lt;h3 id="unir-gpu-hora-hierro-con-tokens-por-equipo">Unir €/GPU-hora (hierro) con tokens por equipo&lt;/h3>
&lt;p>Los dos planos de atribución — OpenCost (hierro) y LiteLLM (tokens) — se cruzan con una clave:
el &lt;code>model_name&lt;/code> y el &lt;code>namespace&lt;/code>. El join se hace fuera de las herramientas (en un pipeline de
BI o una consulta SQL sobre la base de datos de LiteLLM y los datos exportados de OpenCost):&lt;/p>
&lt;pre tabindex="0">&lt;code>coste_por_token_real = OpenCost.GPU_cost_namespace / LiteLLM.total_tokens_team
&lt;/code>&lt;/pre>&lt;p>Cuando el coste on-prem está bien declarado en &lt;code>input_cost_per_token&lt;/code>, LiteLLM ya hace este
cálculo internamente y el &lt;code>response_cost&lt;/code> es correcto. El join explícito sirve para validar: si
la suma de &lt;code>response_cost&lt;/code> de LiteLLM por equipo no coincide con el coste GPU asignado por
OpenCost al namespace, hay una deriva en el &lt;code>input_cost_per_token&lt;/code> que hay que corregir
(throughput cambiado, optimización nueva, más réplicas).&lt;/p>
&lt;hr>
&lt;h2 id="kueue-como-mecanismo-de-presupuestocuota-de-gpu">Kueue como mecanismo de presupuesto/cuota de GPU&lt;/h2>
&lt;h3 id="clusterqueue-localqueue-y-cohorts">ClusterQueue, LocalQueue y cohorts&lt;/h3>
&lt;p>Kueue introduce la capa de &lt;strong>scheduling con cuotas&lt;/strong> sobre Kubernetes
(&lt;a href="https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/">Kueue — Cluster Queue&lt;/a>,
&lt;a href="https://kueue.sigs.k8s.io/docs/concepts/cohort/">Kueue — Cohort&lt;/a>,
&lt;a href="https://kueue.sigs.k8s.io/docs/concepts/fair_sharing/">Kueue — Fair Sharing&lt;/a>):&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Objeto&lt;/th>
&lt;th>Alcance&lt;/th>
&lt;th>Qué hace&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>ResourceFlavor&lt;/code>&lt;/td>
&lt;td>cluster&lt;/td>
&lt;td>mapea recursos a un grupo de nodos (ej.: nodos H100)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>ClusterQueue&lt;/code>&lt;/td>
&lt;td>cluster&lt;/td>
&lt;td>define &lt;code>nominalQuota&lt;/code>, &lt;code>borrowingLimit&lt;/code>, &lt;code>lendingLimit&lt;/code> por recurso/flavor&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>LocalQueue&lt;/code>&lt;/td>
&lt;td>namespace&lt;/td>
&lt;td>punto de entrada para los workloads del equipo; apunta a un &lt;code>ClusterQueue&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>Cohort&lt;/code>&lt;/td>
&lt;td>cluster&lt;/td>
&lt;td>agrupa ClusterQueues que pueden prestarse quota entre sí&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="conceptos-de-quota">Conceptos de quota&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>&lt;code>nominalQuota&lt;/code>&lt;/strong>: GPUs garantizadas al ClusterQueue en todo momento.&lt;/li>
&lt;li>&lt;strong>&lt;code>borrowingLimit&lt;/code>&lt;/strong>: GPUs adicionales máximas que puede tomar prestadas del cohort cuando
otros no las usan.&lt;/li>
&lt;li>&lt;strong>&lt;code>lendingLimit&lt;/code>&lt;/strong>: GPUs de su &lt;code>nominalQuota&lt;/code> que permite prestar a otros (si no se especifica,
puede prestar todas las no usadas).&lt;/li>
&lt;li>&lt;strong>Fair Sharing&lt;/strong>: mecanismo que ordena los workloads pendientes por uso histórico de recursos
de su LocalQueue, dando preferencia a quien ha consumido menos. Compatible con cohorts
jerárquicos desde Kueue v0.11.&lt;/li>
&lt;/ul>
&lt;h3 id="yaml-de-ejemplo-3-equipos-en-cohort-llm-platform">YAML de ejemplo: 3 equipos en cohort llm-platform&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">&lt;span class="c"># ResourceFlavor que mapea a los nodos H100 SXM&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">apiVersion&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">kueue.x-k8s.io/v1beta2&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">kind&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">ResourceFlavor&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">metadata&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">h100-sxm&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">spec&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">nodeLabels&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">accelerator&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">h100-sxm&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nn">---&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="c"># ClusterQueue equipo datos — 2 GPUs garantizadas, puede pedir hasta 2 prestadas&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">apiVersion&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">kueue.x-k8s.io/v1beta2&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">kind&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">ClusterQueue&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">metadata&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">cq-datos&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">spec&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">cohort&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">llm-platform&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">namespaceSelector&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">matchLabels&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">kubernetes.io/metadata.name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">ns-datos&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">queueingStrategy&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">BestEffortFIFO&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">resourceGroups&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">coveredResources&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;nvidia.com/gpu&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">flavors&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">h100-sxm&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">resources&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;nvidia.com/gpu&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">nominalQuota&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="m">2&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">borrowingLimit&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="m">2&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="c"># puede usar hasta 4 GPUs en total&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">preemption&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">reclaimWithinCohort&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">LowerPriority&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">withinClusterQueue&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">LowerPriority&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nn">---&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="c"># ClusterQueue equipo ia — 1 GPU garantizada, presta lo que no usa&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">apiVersion&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">kueue.x-k8s.io/v1beta2&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">kind&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">ClusterQueue&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">metadata&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">cq-ia&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">spec&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">cohort&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">llm-platform&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">namespaceSelector&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">matchLabels&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">kubernetes.io/metadata.name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">ns-ia&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">resourceGroups&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">coveredResources&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;nvidia.com/gpu&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">flavors&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">h100-sxm&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">resources&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;nvidia.com/gpu&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">nominalQuota&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="m">1&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">lendingLimit&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="m">1&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="c"># presta su GPU cuando no la usa&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nn">---&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="c"># ClusterQueue equipo plataforma — 1 GPU garantizada, prioridad baja&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">apiVersion&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">kueue.x-k8s.io/v1beta2&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">kind&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">ClusterQueue&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">metadata&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">cq-plataforma&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">spec&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">cohort&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">llm-platform&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">namespaceSelector&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">matchLabels&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">kubernetes.io/metadata.name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">ns-plataforma&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">resourceGroups&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">coveredResources&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;nvidia.com/gpu&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">flavors&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">h100-sxm&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">resources&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;nvidia.com/gpu&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">nominalQuota&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="m">1&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">borrowingLimit&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="m">3&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="c"># puede usar las 4 si el resto está libre&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nn">---&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="c"># LocalQueue en cada namespace — punto de entrada para los workloads&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">apiVersion&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">kueue.x-k8s.io/v1beta2&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">kind&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">LocalQueue&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">metadata&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">lq-datos&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">namespace&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">ns-datos&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">spec&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">clusterQueue&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">cq-datos&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="cómo-kueue-materializa-el-presupuesto-de-gpu">Cómo Kueue materializa el &amp;ldquo;presupuesto de GPU&amp;rdquo;&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Concepto FinOps&lt;/th>
&lt;th>Mecanismo Kueue&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Presupuesto garantizado&lt;/td>
&lt;td>&lt;code>nominalQuota&lt;/code>: el equipo siempre tiene estas GPUs disponibles&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Límite de gasto máximo&lt;/td>
&lt;td>&lt;code>nominalQuota + borrowingLimit&lt;/code>: techo absoluto de GPUs admisibles&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Préstamo de capacidad ociosa&lt;/td>
&lt;td>cohort + &lt;code>lendingLimit&lt;/code>: otros toman lo que este no usa&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Fair sharing entre equipos&lt;/td>
&lt;td>&lt;code>Fair Sharing&lt;/code> + &lt;code>WorkloadPriorityClass&lt;/code>: los que más han consumido esperan más&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Recuperar la cuota propia&lt;/td>
&lt;td>&lt;code>preemption.reclaimWithinCohort: LowerPriority&lt;/code>: el dueño de la GPU prestada la recupera expulsando workloads de menor prioridad&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>El &lt;code>nominalQuota&lt;/code> es la expresión del presupuesto en GPUs: si el equipo tiene 2 GPUs nominales
y el precio es 1,40 €/GPU-hora, su gasto máximo garantizado es&lt;/p>
&lt;p>[
\text{gasto máximo nominal} = 2 \times 1{,}40 \times 720 = 2,016 \text{ €/mes}
]&lt;/p>
&lt;p>El &lt;code>borrowingLimit&lt;/code> fija el sobrecoste potencial si toma prestado del cohort (y ese préstamo se
puede imputar con la misma fórmula multiplicando las horas de uso prestado por 1,40 €/GPU-hora).&lt;/p>
&lt;hr>
&lt;h2 id="modelo-de-coste-ejemplo-con-4h100-sxm-y-3-equipos">Modelo de coste: ejemplo con 4×H100 SXM y 3 equipos&lt;/h2>
&lt;h3 id="precio-del-nodo">Precio del nodo&lt;/h3>
&lt;p>Hardware genérico: servidor con 4×H100 SXM 80 GB. Precios de mercado 2026: una H100 SXM
individual cuesta entre 27.000 y 40.000 € según fuente
(&lt;a href="https://www.gmicloud.ai/en/blog/nvidia-h100-gpu-pricing-2026-rent-vs-buy-cost-analysis">GMI Cloud — H100 GPU Pricing 2026&lt;/a>);
un servidor 4×H100 se sitúa en el rango 120.000–180.000 € incluyendo chasis, fuentes y NVLink.
Usamos 140.000 € como supuesto genérico para el servidor completo (4 GPUs + infraestructura).&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Componente&lt;/th>
&lt;th>Cálculo&lt;/th>
&lt;th>€/hora nodo&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Capex amortizado (140.000 €, 4 años, 90 % disponibilidad)&lt;/td>
&lt;td>140.000 ÷ (4 × 8.760 × 0,9)&lt;/td>
&lt;td>~4,43&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Energía (4 × 700 W TDP × PUE 1,4 × 0,12 €/kWh)&lt;/td>
&lt;td>4 × 0,7 × 1,4 × 0,12&lt;/td>
&lt;td>~0,47&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Operación / red / mantenimiento&lt;/td>
&lt;td>estimación&lt;/td>
&lt;td>~0,70&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Total nodo 4×H100&lt;/strong>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;strong>~5,60 €/h&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Por GPU&lt;/strong>&lt;/td>
&lt;td>5,60 ÷ 4&lt;/td>
&lt;td>&lt;strong>~1,40 €/GPU-hora&lt;/strong>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Fórmula del coste por token para el modelo de 70B (throughput sostenido ~2.000 tok/s en TP=4):&lt;/p>
&lt;p>$$\text{€/1M tokens} = \frac{5{,}60 \text{ €/h}}{2{,}000 \text{ tok/s} \times 3,600 \text{ s/h}} \times 10^6 \approx 0{,}78 \text{ €/1M tokens}$$&lt;/p>
&lt;h3 id="escenario-mensual-3-equipos-utilización-heterogénea">Escenario mensual: 3 equipos, utilización heterogénea&lt;/h3>
&lt;p>Mes de referencia (720 horas). Las 4 GPUs están nominalmente asignadas: 2 a &lt;strong>Datos&lt;/strong>, 1 a
&lt;strong>IA&lt;/strong>, 1 a &lt;strong>Plataforma&lt;/strong>. La ocupación real varía:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Equipo&lt;/th>
&lt;th>GPUs nominales&lt;/th>
&lt;th>Ocupación media&lt;/th>
&lt;th>GPU-horas usadas&lt;/th>
&lt;th>GPU-horas ociosas&lt;/th>
&lt;th>Coste GPU (€)&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Datos&lt;/td>
&lt;td>2&lt;/td>
&lt;td>75 %&lt;/td>
&lt;td>1.080&lt;/td>
&lt;td>360&lt;/td>
&lt;td>~1.512&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>IA&lt;/td>
&lt;td>1&lt;/td>
&lt;td>60 %&lt;/td>
&lt;td>432&lt;/td>
&lt;td>288&lt;/td>
&lt;td>~605&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Plataforma&lt;/td>
&lt;td>1&lt;/td>
&lt;td>35 %&lt;/td>
&lt;td>252&lt;/td>
&lt;td>468&lt;/td>
&lt;td>~353&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Idle cluster&lt;/strong>&lt;/td>
&lt;td>—&lt;/td>
&lt;td>—&lt;/td>
&lt;td>—&lt;/td>
&lt;td>&lt;strong>1.116&lt;/strong>&lt;/td>
&lt;td>&lt;strong>~1.562&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Total nodo&lt;/strong>&lt;/td>
&lt;td>4&lt;/td>
&lt;td>—&lt;/td>
&lt;td>1.764&lt;/td>
&lt;td>1.116&lt;/td>
&lt;td>&lt;strong>~4.032&lt;/strong>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;blockquote>
&lt;p>Coste total del nodo al mes: 5,60 €/h × 720 h = 4.032 €.
Coste por GPU-hora usada: 5,60 ÷ 4 = 1,40 €.&lt;/p>
&lt;/blockquote>
&lt;p>Con &lt;code>shareIdle=false&lt;/code> (showback), el idle de 1.562 € aparece como fila separada en el informe
y nadie lo paga directamente. Con &lt;code>shareIdle=true&lt;/code> (chargeback proporcional), se reparte entre
los tres en proporción a su coste no-idle: Datos absorbe ~857 €, IA ~333 €, Plataforma ~192 €
de idle adicional.&lt;/p>
&lt;h3 id="consumo-de-tokens-por-equipo-litellm-spend-logs">Consumo de tokens por equipo (LiteLLM spend logs)&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Equipo&lt;/th>
&lt;th>Modelo&lt;/th>
&lt;th>Tokens/mes&lt;/th>
&lt;th>Coste/token&lt;/th>
&lt;th>Coste tokens (€)&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Datos&lt;/td>
&lt;td>llama-70b (TP=4)&lt;/td>
&lt;td>380 M&lt;/td>
&lt;td>0,78 €/1M&lt;/td>
&lt;td>~296&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>IA&lt;/td>
&lt;td>llama-8b (1 GPU)&lt;/td>
&lt;td>210 M&lt;/td>
&lt;td>0,35 €/1M&lt;/td>
&lt;td>~74&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Plataforma&lt;/td>
&lt;td>llama-8b + batch&lt;/td>
&lt;td>80 M&lt;/td>
&lt;td>0,35 €/1M&lt;/td>
&lt;td>~28&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="informe-de-chargeback-mensual-showback--tokens">Informe de chargeback mensual (showback + tokens)&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Equipo&lt;/th>
&lt;th>Coste GPU hierro (€)&lt;/th>
&lt;th>Coste idle asignado (€)&lt;/th>
&lt;th>Coste tokens LiteLLM (€)&lt;/th>
&lt;th>Total imputable (€)&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Datos&lt;/td>
&lt;td>1.512&lt;/td>
&lt;td>857&lt;/td>
&lt;td>296&lt;/td>
&lt;td>&lt;strong>2.665&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>IA&lt;/td>
&lt;td>605&lt;/td>
&lt;td>333&lt;/td>
&lt;td>74&lt;/td>
&lt;td>&lt;strong>1.012&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Plataforma&lt;/td>
&lt;td>353&lt;/td>
&lt;td>192&lt;/td>
&lt;td>28&lt;/td>
&lt;td>&lt;strong>573&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Idle (sin shareIdle)&lt;/td>
&lt;td>1.562&lt;/td>
&lt;td>—&lt;/td>
&lt;td>—&lt;/td>
&lt;td>visible, no imputado&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="tabla-de-dimensiones-de-atribución--política">Tabla de dimensiones de atribución × política&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Dimensión de atribución&lt;/th>
&lt;th>Herramienta&lt;/th>
&lt;th>Showback&lt;/th>
&lt;th>Chargeback&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>namespace&lt;/td>
&lt;td>OpenCost &lt;code>aggregate=namespace&lt;/code>&lt;/td>
&lt;td>&lt;code>/allocation?aggregate=namespace&amp;amp;includeIdle=true&lt;/code>&lt;/td>
&lt;td>&lt;code>shareIdle=true&lt;/code> + transfer a P&amp;amp;L&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>label de pod&lt;/td>
&lt;td>OpenCost &lt;code>aggregate=label:equipo&lt;/code>&lt;/td>
&lt;td>informe por label&lt;/td>
&lt;td>idem con shareIdle&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>virtual key / team&lt;/td>
&lt;td>LiteLLM &lt;code>/global/spend/teams&lt;/code>&lt;/td>
&lt;td>dashboard de tokens&lt;/td>
&lt;td>&lt;code>max_budget&lt;/code> duro por clave&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>ClusterQueue (GPU reservada)&lt;/td>
&lt;td>Kueue &lt;code>nominalQuota&lt;/code>&lt;/td>
&lt;td>observar uso vs quota&lt;/td>
&lt;td>presupuesto en GPUs comprometido&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>tag de request&lt;/td>
&lt;td>LiteLLM &lt;code>tag_budgets&lt;/code>&lt;/td>
&lt;td>por centro de coste&lt;/td>
&lt;td>presupuesto por tag&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>annotation de pod&lt;/td>
&lt;td>OpenCost &lt;code>aggregate=annotation:KEY&lt;/code>&lt;/td>
&lt;td>informe por annotation&lt;/td>
&lt;td>idem con shareIdle&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;hr>
&lt;h2 id="flujo-de-datos-end-to-end">Flujo de datos end-to-end&lt;/h2>
&lt;div class="diagram" style="max-width:780px;margin:1rem auto;">
&lt;svg viewBox="0 0 780 300" role="img" aria-label="Flujo de atribución chargeback GPU: del pod de inferencia a los tres planos OpenCost, LiteLLM y Kueue, y al informe mensual" xmlns="http://www.w3.org/2000/svg">
&lt;style>.bx{fill:none;stroke:currentColor;stroke-width:1.3}.dsh{fill:none;stroke:currentColor;stroke-width:1.3;stroke-dasharray:5 3}.tl{font:600 12px sans-serif;fill:currentColor}.ts{font:11px sans-serif;fill:currentColor}.ar{fill:none;stroke:currentColor;stroke-width:1.3;marker-end:url(#chm)}&lt;/style>
&lt;defs>&lt;marker id="chm" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="7" markerHeight="7" orient="auto">&lt;path d="M0,0 L10,5 L0,10 z" fill="currentColor"/>&lt;/marker>&lt;/defs>
&lt;rect class="bx" x="20" y="30" width="160" height="50" rx="6"/>
&lt;text x="32" y="52" class="tl">Pod vLLM&lt;/text>
&lt;text x="32" y="68" class="ts">label:equipo=datos&lt;/text>
&lt;path class="ar" d="M180,55 L220,80"/>
&lt;path class="ar" d="M180,55 L220,145"/>
&lt;path class="ar" d="M180,55 L220,210"/>
&lt;rect class="bx" x="220" y="60" width="175" height="50" rx="6"/>
&lt;text x="232" y="82" class="tl">OpenCost (hierro)&lt;/text>
&lt;text x="232" y="98" class="ts">€/GPU-hora × namespace/label&lt;/text>
&lt;rect class="bx" x="220" y="125" width="175" height="50" rx="6"/>
&lt;text x="232" y="147" class="tl">LiteLLM (tokens)&lt;/text>
&lt;text x="232" y="163" class="ts">€/token × team_id/key/model&lt;/text>
&lt;rect class="bx" x="220" y="190" width="175" height="50" rx="6"/>
&lt;text x="232" y="212" class="tl">Kueue (scheduler)&lt;/text>
&lt;text x="232" y="228" class="ts">nominalQuota / cohort / fair-share&lt;/text>
&lt;path class="ar" d="M395,85 L440,140"/>
&lt;path class="ar" d="M395,150 L440,150"/>
&lt;path class="ar" d="M395,215 L440,160"/>
&lt;rect class="bx" x="440" y="115" width="185" height="70" rx="6"/>
&lt;text x="452" y="137" class="tl">Informe mensual&lt;/text>
&lt;text x="452" y="153" class="ts">€/equipo (hierro + idle + tokens)&lt;/text>
&lt;text x="452" y="169" class="ts">GPUs prestadas · presupuesto restante&lt;/text>
&lt;path class="ar" d="M625,150 L665,150"/>
&lt;rect class="dsh" x="665" y="115" width="100" height="70" rx="6"/>
&lt;text x="677" y="143" class="tl">Showback&lt;/text>
&lt;text x="677" y="159" class="ts">o&lt;/text>
&lt;text x="677" y="175" class="tl">Chargeback&lt;/text>
&lt;text x="20" y="275" class="ts">Los tres planos son ortogonales: OpenCost ve el hierro, LiteLLM ve los tokens, Kueue ve la cuota del scheduler.&lt;/text>
&lt;text x="20" y="292" class="ts">El cruce de los tres da el número completo: €/equipo con idle, tokens y GPUs prestadas.&lt;/text>
&lt;/svg>
&lt;/div>
&lt;hr>
&lt;h2 id="fair-sharing-y-preemption-cómo-kueue-recupera-la-cuota">Fair sharing y preemption: cómo Kueue recupera la cuota&lt;/h2>
&lt;p>Cuando el equipo Datos ha agotado sus 2 GPUs nominales y toma prestadas las de Plataforma
(que no las usa), Kueue registra ese préstamo. En cuanto Plataforma lanza un workload nuevo:&lt;/p>
&lt;ol>
&lt;li>Kueue detecta que &lt;code>cq-plataforma&lt;/code> está por debajo de su &lt;code>nominalQuota&lt;/code>.&lt;/li>
&lt;li>Con &lt;code>preemption.reclaimWithinCohort: LowerPriority&lt;/code>, Kueue expulsa el workload de Datos
que estaba en la GPU prestada, si su prioridad es menor.&lt;/li>
&lt;li>El workload expulsado vuelve a la cola de &lt;code>cq-datos&lt;/code> y se readmite cuando se libere cuota.&lt;/li>
&lt;/ol>
&lt;p>El &lt;strong>Fair Sharing&lt;/strong> ordena la cola de pendientes por uso histórico acumulado de la LocalQueue:
si Datos ha consumido más GPU-horas que IA en el periodo reciente, los próximos workloads de
IA tienen prioridad de admisión antes que los nuevos de Datos. Esto implementa un reparto
equitativo sin bloquear a nadie permanentemente.&lt;/p>
&lt;hr>
&lt;h2 id="configuración-de-opencost-para-el-precio-del-nodo">Configuración de OpenCost para el precio del nodo&lt;/h2>
&lt;p>Para que los números de la sección anterior sean reproducibles, el precio del nodo hay que
declararlo explícitamente (el valor por defecto subestima GPU on-prem,
&lt;a href="https://github.com/opencost/opencost/issues/3781">issue #3781&lt;/a>):&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">&lt;span class="c"># values.yaml Helm de OpenCost — nodo 4×H100 on-prem, 5,60 €/h&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">opencost&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">customPricing&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">enabled&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="kc">true&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">provider&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">custom&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">costModel&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">description&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;Nodo 4xH100 SXM on-prem&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">CPU&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;0.025&amp;#34;&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="c"># €/CPU-hora (componente menor)&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">RAM&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;0.003&amp;#34;&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="c"># €/GB-hora&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">GPU&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;1.40&amp;#34;&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="c"># €/GPU-hora ← el que mueve el reparto&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">storage&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;0.0002&amp;#34;&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="c"># €/GB-hora&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Verificar tras configurar:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Comprobar precio resuelto por nodo&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">curl http://localhost:9003/allNodePricing
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Consulta de atribución por label de equipo, último mes, idle separado&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">curl -G http://localhost:9003/allocation &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> -d &lt;span class="nv">window&lt;/span>&lt;span class="o">=&lt;/span>lastmonth &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> -d &lt;span class="nv">aggregate&lt;/span>&lt;span class="o">=&lt;/span>label:equipo &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> -d &lt;span class="nv">includeIdle&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="nb">true&lt;/span> &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> -d &lt;span class="nv">shareIdle&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="nb">false&lt;/span> &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> -d &lt;span class="nv">accumulate&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="nb">true&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;hr>
&lt;h2 id="promql-de-referencia-para-paneles-de-chargeback">PromQL de referencia para paneles de chargeback&lt;/h2>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-promql" data-lang="promql">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Coste GPU asignado por equipo (label:equipo) — €/hora&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">sum&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">by&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nv">label_equipo&lt;/span>&lt;span class="o">)&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nv">container_gpu_allocation&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">on&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nv">node&lt;/span>&lt;span class="o">)&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">group_left&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nv">node_gpu_hourly_cost&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="o">)&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="c1"># GPU-horas ociosas por nodo (idle &amp;gt; 15 min)&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">sum&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">by&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nv">node&lt;/span>&lt;span class="o">)&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="kr">avg_over_time&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nv">DCGM_FI_DEV_GPU_UTIL&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s">15m&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="o">)&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">/&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="mi">100&lt;/span>&lt;span class="o">)&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">on&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nv">node&lt;/span>&lt;span class="o">)&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">group_left&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nv">node_gpu_hourly_cost&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="o">)&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="c1"># Coste acumulado del mes por equipo (suma sobre ventana)&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="kr">sum_over_time&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="k">sum&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">by&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nv">label_equipo&lt;/span>&lt;span class="o">)&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nv">container_gpu_allocation&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">on&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nv">node&lt;/span>&lt;span class="o">)&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">group_left&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nv">node_gpu_hourly_cost&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="o">)&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s">30d&lt;/span>&lt;span class="err">:&lt;/span>&lt;span class="s">1h&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="o">)&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;hr>
&lt;h2 id="fuentes">Fuentes&lt;/h2>
&lt;ul>
&lt;li>FinOps Foundation — Invoicing &amp;amp; Chargeback Capability — &lt;a href="https://www.finops.org/framework/capabilities/invoicing-chargeback/">https://www.finops.org/framework/capabilities/invoicing-chargeback/&lt;/a>&lt;/li>
&lt;li>FinOps Foundation — Data Analysis and Showback — &lt;a href="https://www.finops.org/framework/previous-capabilities/analysis-showback/">https://www.finops.org/framework/previous-capabilities/analysis-showback/&lt;/a>&lt;/li>
&lt;li>FinOps Foundation — Allocation Capability — &lt;a href="https://www.finops.org/framework/capabilities/allocation/">https://www.finops.org/framework/capabilities/allocation/&lt;/a>&lt;/li>
&lt;li>OpenCost — API (Allocation API, parámetros window/aggregate/shareIdle/idleByNode) — &lt;a href="https://opencost.io/docs/integrations/api/">https://opencost.io/docs/integrations/api/&lt;/a>&lt;/li>
&lt;li>OpenCost — API Examples — &lt;a href="https://opencost.io/docs/integrations/api-examples/">https://opencost.io/docs/integrations/api-examples/&lt;/a>&lt;/li>
&lt;li>OpenCost — GitHub (issue #3781, infra-precio GPU on-prem por defecto) — &lt;a href="https://github.com/opencost/opencost/issues/3781">https://github.com/opencost/opencost/issues/3781&lt;/a>&lt;/li>
&lt;li>LiteLLM — Virtual Keys (&lt;code>max_budget&lt;/code>, &lt;code>budget_duration&lt;/code>, &lt;code>team_id&lt;/code>) — &lt;a href="https://docs.litellm.ai/docs/proxy/virtual_keys">https://docs.litellm.ai/docs/proxy/virtual_keys&lt;/a>&lt;/li>
&lt;li>LiteLLM — Setting Team Budgets — &lt;a href="https://docs.litellm.ai/docs/proxy/team_budgets">https://docs.litellm.ai/docs/proxy/team_budgets&lt;/a>&lt;/li>
&lt;li>LiteLLM — Spend Tracking — &lt;a href="https://docs.litellm.ai/docs/proxy/cost_tracking">https://docs.litellm.ai/docs/proxy/cost_tracking&lt;/a>&lt;/li>
&lt;li>LiteLLM — Budgets &amp;amp; Rate Limits — &lt;a href="https://docs.litellm.ai/docs/proxy/users">https://docs.litellm.ai/docs/proxy/users&lt;/a>&lt;/li>
&lt;li>LiteLLM — Setting Tag Budgets — &lt;a href="https://docs.litellm.ai/docs/proxy/tag_budgets">https://docs.litellm.ai/docs/proxy/tag_budgets&lt;/a>&lt;/li>
&lt;li>LiteLLM — Budget Reset Times — &lt;a href="https://docs.litellm.ai/docs/proxy/budget_reset_and_tz">https://docs.litellm.ai/docs/proxy/budget_reset_and_tz&lt;/a>&lt;/li>
&lt;li>Kueue — Cluster Queue (nominalQuota, borrowingLimit, lendingLimit, preemption) — &lt;a href="https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/">https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/&lt;/a>&lt;/li>
&lt;li>Kueue — Cohort — &lt;a href="https://kueue.sigs.k8s.io/docs/concepts/cohort/">https://kueue.sigs.k8s.io/docs/concepts/cohort/&lt;/a>&lt;/li>
&lt;li>Kueue — Fair Sharing — &lt;a href="https://kueue.sigs.k8s.io/docs/concepts/fair_sharing/">https://kueue.sigs.k8s.io/docs/concepts/fair_sharing/&lt;/a>&lt;/li>
&lt;li>Kueue — Administer Cluster Quotas — &lt;a href="https://kueue.sigs.k8s.io/docs/tasks/manage/administer_cluster_quotas/">https://kueue.sigs.k8s.io/docs/tasks/manage/administer_cluster_quotas/&lt;/a>&lt;/li>
&lt;li>GMI Cloud — NVIDIA H100 GPU Pricing 2026 — &lt;a href="https://www.gmicloud.ai/en/blog/nvidia-h100-gpu-pricing-2026-rent-vs-buy-cost-analysis">https://www.gmicloud.ai/en/blog/nvidia-h100-gpu-pricing-2026-rent-vs-buy-cost-analysis&lt;/a>&lt;/li>
&lt;li>IntuitionLabs — NVIDIA AI GPU Prices H100 Cost Guide — &lt;a href="https://intuitionlabs.ai/articles/nvidia-ai-gpu-pricing-guide">https://intuitionlabs.ai/articles/nvidia-ai-gpu-pricing-guide&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="ver-también">Ver también&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://blog.lo0.es/posts/opencost-cost-allocation-kubernetes/">OpenCost a fondo: cómo se asigna el coste de GPU en Kubernetes&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://blog.lo0.es/posts/kubecost-vs-opencost-vs-alternativas/">Kubecost vs OpenCost vs alternativas&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://blog.lo0.es/posts/coste-por-token-y-por-request/">Del GPU-hora al coste por token&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://blog.lo0.es/posts/finops-multi-tenancy-gpu-litellm/">FinOps y multi-tenancy del cluster GPU: quién paga qué&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://blog.lo0.es/posts/compartir-gpu-time-slicing-mps-mig/">Compartir una GPU: time-slicing, MPS y MIG&lt;/a>&lt;/li>
&lt;/ul></description></item></channel></rss>