<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Audio-Embedding on The Probability Engine</title><link>https://carlosdanieljimenez.com/tags/audio-embedding/</link><description>Recent content in Audio-Embedding on The Probability Engine</description><generator>Hugo -- 0.147.3</generator><language>en-us</language><lastBuildDate>Wed, 20 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://carlosdanieljimenez.com/tags/audio-embedding/index.xml" rel="self" type="application/rss+xml"/><item><title>The Quadrilingual Probe: How Aquamosh (1998) Falsifies the Distributional Hypothesis Across Five Embedding Architectures</title><link>https://carlosdanieljimenez.com/post/2026-05-20-aquamosh-quadrilingual-anatomy/</link><pubDate>Wed, 20 May 2026 00:00:00 +0000</pubDate><guid>https://carlosdanieljimenez.com/post/2026-05-20-aquamosh-quadrilingual-anatomy/</guid><description>&lt;h2 id="abstract">Abstract&lt;/h2>
&lt;p>This research uses &lt;em>Aquamosh&lt;/em> (1998), the quadrilingual debut album by Plastilina Mosh (Spanish, English, French, Japanese; produced by Tom Rothrock and Rob Schnapf — Beck&amp;rsquo;s &lt;em>Odelay&lt;/em> team), as an empirical &lt;strong>falsification probe&lt;/strong> for distributional sentence embeddings. The album&amp;rsquo;s quadrilingual structure converts code-switching from anecdotal concern into a quantitative experiment: every language transition is a &lt;em>guaranteed&lt;/em> lexical discontinuity, allowing us to dissociate topical continuity from surface form.&lt;/p>
&lt;p>&lt;strong>Core Finding (CONFIRMED):&lt;/strong> In all five sentence-embedding architectures probed — OpenAI &lt;code>text-embedding-3-large&lt;/code> (3072-dim, decoder), Google LaBSE (768-dim, encoder, parallel-corpus), BAAI BGE-M3 (1024-dim), multilingual-E5-large (1024-dim), and paraphrase-multilingual-MPNet (768-dim) — a language switch in consecutive lyric lines approximately &lt;strong>doubles the probability of &amp;ldquo;window break&amp;rdquo;&lt;/strong> (the embedding similarity falling below a calibrated coherence threshold). Mean relative gap across models: &lt;strong>1.69×&lt;/strong>; range: &lt;strong>1.31× (E5) to 1.94× (OpenAI)&lt;/strong>. Permutation tests against H₀ of language-rupture independence reject with &lt;strong>z = +6.54 (OpenAI), z = +4.51 (LaBSE)&lt;/strong>, both p &amp;lt; 10⁻⁴ over 10,000 simulations. Logistic regression with GEE clustered by track and controls for line position and anchor/successor languages yields &lt;strong>OR = 3.99 [2.51, 6.36] for OpenAI (p &amp;lt; 0.001)&lt;/strong>, &lt;strong>OR = 2.52 [1.39, 4.57] for LaBSE (p = 0.002)&lt;/strong>. LLM-as-judge against GPT-4o-mini shows OpenAI declares &amp;ldquo;rupture&amp;rdquo; while a sophisticated reader sees continuity &lt;strong>3.18× more often in switches than in same-language transitions&lt;/strong> (false-break rate 0.060 vs 0.191).&lt;/p></description></item></channel></rss>