# From Silent Tamil to Real Singapore Tamil Speech: How Codex Upgraded an SLS Interactive

**A reproducible agentic AI workflow for converting a browser-dependent Tamil activity into an offline-ready SLS package with genuine Singapore Tamil audio**

*Published 6 June 2026 · Built with Codex and agentic AI*

An interactive can display Tamil perfectly and still fail at the moment that matters: when a learner presses **Read Aloud**.

That was the problem with this Singapore Tamil vocabulary activity. Its code correctly requested the `ta-SG` locale through the browser's Web Speech API, but the device did not expose a Singapore Tamil voice. The activity could either remain silent or fall back to an unsuitable system voice.

With Codex acting as an agentic engineering partner, we diagnosed the limitation, found a supported Singapore Tamil neural voice, generated the required speech assets, rewired the interactive to use them offline, fixed a touch-playback issue, tested every audio file, and produced an SLS-ready ZIP.

The result is not merely a workaround on one computer. It is a portable interactive that carries its own Singapore Tamil speech.

## The completed resources

- [Open the upgraded interactive](index.html)
- [Download the SLS-ready ZIP](../Interactive_20260606032934_with_offline_tamil_audio.zip)
- [Download the repeatable audio-generation script](generate_tamil_audio.ps1)

![Voice diagnostics showing packaged Singapore Tamil audio](tamil-offline-audio-diagnostics.png)

## Why browser Tamil text-to-speech failed

The original activity used:

```js
const voices = window.speechSynthesis.getVoices();
const taSGVoice = voices.find(voice => voice.lang === "ta-SG");
```

This is standards-based code, but `getVoices()` only returns voices exposed by the current browser and operating system. Support therefore varies by:

- operating system;
- installed language and speech packs;
- browser;
- device management policies;
- whether the activity is opened in an LMS iframe; and
- whether the browser implements speech synthesis at all.

Setting this alone does not install a voice:

```js
utterance.lang = "ta-SG";
```

It only requests the locale. If no matching voice exists, a browser may stay silent or choose an unsuitable default.

For a Singapore classroom resource, silently substituting `ta-IN`, `ta-MY`, or an English system voice is not an acceptable solution.

## The design decision

Microsoft documents dedicated Singapore Tamil neural voices, including:

- `ta-SG-VenbaNeural`, female; and
- `ta-SG-AnbuNeural`, male.

Azure Speech is an online service, not a small JavaScript library that can simply be copied into an offline SLS package. A production application can call Azure through a secure backend, but an API key must never be embedded in public client-side HTML.

This activity contains a fixed set of words. The robust design was therefore:

1. Generate each required phrase once using `ta-SG-VenbaNeural`.
2. Store the MP3 files inside the interactive.
3. Prefer packaged audio during playback.
4. Keep an exact browser `ta-SG` voice only as a secondary fallback.
5. Package everything in the ZIP so it works offline in SLS.

This converts cloud synthesis during development into dependable local playback during learning.

## Step 1: Audit every phrase that can be spoken

The activity contained eight vocabulary words:

| Tamil | Meaning | Audio file |
|---|---|---|
| பள்ளி | School | `audio/palli.mp3` |
| புத்தகம் | Book | `audio/puththagam.mp3` |
| நண்பன் | Friend | `audio/nanban.mp3` |
| ஆசிரியர் | Teacher | `audio/aasiriyar.mp3` |
| வீடு | House | `audio/veedu.mp3` |
| உணவு | Food | `audio/unavu.mp3` |
| தண்ணீர் | Water | `audio/thanneer.mp3` |
| மரம் | Tree | `audio/maram.mp3` |

The listening-discrimination exercise also used two contrast words:

- `வீது` → `audio/veethu.mp3`
- `தணீர்` → `audio/thaneer.mp3`

This inventory matters. Missing even one dynamically spoken phrase would cause the activity to fall back to browser speech.

## Step 2: Confirm the Singapore Tamil voices

For this implementation, Codex used the open-source `edge-tts` command-line client to access Microsoft's online speech service during development.

Install and inspect the available voices:

```powershell
python -m pip install edge-tts
python -m edge_tts --list-voices | Select-String '^ta-SG-'
```

The voice list returned:

```text
ta-SG-AnbuNeural
ta-SG-VenbaNeural
```

For a formal production workflow, use the official Azure Speech SDK or REST API with credentials stored securely on a backend. The interactive itself should receive only the generated media, never the service key.

## Step 3: Generate the MP3 assets

A single phrase can be generated with:

```powershell
python -m edge_tts `
  --voice ta-SG-VenbaNeural `
  --text "பள்ளி" `
  --write-media "audio/palli.mp3"
```

The same command was repeated for all ten distinct phrases. The completed audio directory was only about 50 KB, making it suitable for an SLS package.

This project includes [`generate_tamil_audio.ps1`](generate_tamil_audio.ps1), which contains the complete phrase manifest and regenerates all ten files:

```powershell
Set-Location "Interactive_20260606032934"
.\generate_tamil_audio.ps1
```

The script stores Tamil as Unicode escape sequences and decodes them at runtime. This avoids mojibake when it is run in older Windows PowerShell versions that do not reliably interpret UTF-8 source files without a byte-order mark.

For larger activities, keep the text and filenames in a structured manifest like this script. Treat it as the source of truth so content updates do not leave stale audio behind.

## Step 4: Map text to packaged audio

Each vocabulary entry now points to an audio file:

```js
const vocabulary = [
  {
    tamil: "பள்ளி",
    roman: "paḷḷi",
    meaning: "School",
    audio: "audio/palli.mp3"
  }
];
```

Contrast words use a second lookup:

```js
const minimalPairAudio = {
  "வீது": "audio/veethu.mp3",
  "தணீர்": "audio/thaneer.mp3"
};
```

A helper resolves the correct asset:

```js
function getPackagedAudioPath(text) {
  const vocabularyWord = vocabulary.find(word => word.tamil === text);
  return vocabularyWord?.audio || minimalPairAudio[text] || null;
}
```

## Step 5: Prefer offline playback

The upgraded playback path creates a normal HTML audio object:

```js
async function speakText(text, slow = false) {
  stopCurrentSpeech();

  const audioPath = getPackagedAudioPath(text);
  if (!audioPath) {
    speakWithBrowserVoice(text, slow);
    return;
  }

  const audio = new Audio(audioPath);
  audio.playbackRate = slow ? 0.7 : 1;
  audio.preservesPitch = true;
  currentAudio = audio;

  await audio.play();
}
```

The normal and slow buttons still work. Pitch remains labelled as a browser-voice-only control because changing the pitch of packaged MP3 audio reliably would require a separate audio-processing layer.

## Step 6: Preserve user activation on touch devices

This was an important bug found during agentic browser testing.

The original smartboard handler delayed its callback by 50 ms:

```js
setTimeout(() => callback(), 50);
```

That delay can move `audio.play()` outside the browser's trusted user gesture. The browser then blocks playback even though the learner tapped a button.

The fix was to run playback synchronously inside `touchend` and use timestamps for debouncing:

```js
if (touchDuration < 1000 && withinBounds) {
  if (touchEndTime - lastActivationTime < 50) return;

  lastActivationTime = touchEndTime;
  suppressClickUntil = touchEndTime + 500;
  event.preventDefault();
  callback();
}
```

This preserves the audio permission while still suppressing IR touch bounce and synthetic duplicate clicks.

## Step 7: Make diagnostics truthful

The status panel now distinguishes the two playback systems:

- **Primary audio:** packaged Singapore Tamil audio;
- **Voice:** Microsoft Venba Neural (`ta-SG`);
- **Offline clips:** ten phrases;
- **Browser fallback:** available or unavailable.

A missing browser voice is no longer presented as a failure. The learning activity remains fully usable because packaged audio is active.

## Step 8: Validate the media and interface

Codex checked the JavaScript syntax:

```powershell
node --check script.js
```

Every MP3 was decoded with FFmpeg:

```powershell
Get-ChildItem audio -Filter *.mp3 | ForEach-Object {
  ffmpeg -v error -i $_.FullName -f null -
}
```

The folder was served locally:

```powershell
python -m http.server 8766 --bind 127.0.0.1
```

The browser test confirmed:

- the Tamil interface rendered correctly;
- all ten audio files were reachable as `audio/mpeg`;
- selecting a word enabled the playback controls;
- voice diagnostics reported packaged `ta-SG` audio;
- the activity did not require browser speech synthesis; and
- no framework or application errors appeared.

## Step 9: Package it correctly for SLS

The final ZIP contains:

```text
index.html
script.js
styles.css
audio/
  aasiriyar.mp3
  maram.mp3
  nanban.mp3
  palli.mp3
  puththagam.mp3
  thaneer.mp3
  thanneer.mp3
  unavu.mp3
  veedu.mp3
  veethu.mp3
```

`index.html` must be at the ZIP root. Do not zip the parent directory as an extra wrapper folder, or the LMS may not find the launch file.

The package was created with:

```powershell
Compress-Archive `
  -Path "Interactive_20260606032934\*" `
  -DestinationPath "Interactive_20260606032934_with_offline_tamil_audio.zip" `
  -Force
```

## What Codex and agentic AI contributed

This outcome was possible because Codex did more than suggest code. It operated as an agent across the complete engineering loop:

1. **Observed** the interactive and its current failure mode.
2. **Inspected** the HTML, JavaScript, spoken vocabulary, and voice-selection logic.
3. **Researched** authoritative language and voice support.
4. **Selected** an architecture compatible with SLS and offline use.
5. **Generated** all required speech assets.
6. **Patched** the playback, touch handling, and diagnostics.
7. **Tested** syntax, media decoding, HTTP delivery, and rendered behavior.
8. **Packaged** the final resource with the correct ZIP structure.

The speech technology itself is not exclusive to Codex. What Codex made possible was the rapid, end-to-end conversion from an unreliable prototype into a tested classroom package without requiring the teacher to manually coordinate several tools and debug each integration point.

That is the practical value of agentic AI: not merely answering how something might be done, but carrying the work through until the artifact is usable.

## How to replicate this pattern for another language

Use the same workflow:

1. Identify the exact locale required by the curriculum.
2. Enumerate every phrase that can be spoken.
3. Confirm an appropriate voice from an authoritative provider.
4. Have a fluent educator review pronunciation and wording.
5. Generate one audio asset per stable phrase.
6. Map text identifiers to local media paths.
7. Keep live browser TTS only as an explicit locale-matched fallback.
8. Preserve synchronous user activation for audio playback.
9. Add diagnostics that state the actual voice and locale.
10. Test offline, in an iframe, and on touch hardware.
11. Package all assets with `index.html` at the ZIP root.

For highly dynamic text, prerecorded clips will not scale. Use a secure server-side TTS endpoint instead, with caching and no credentials exposed to the browser.

## Important quality checks

Before publishing educational speech:

- ask a Singapore Tamil teacher or fluent reviewer to approve pronunciation;
- check vocabulary against the intended MOE level and local usage;
- avoid presenting another regional Tamil voice as Singapore Tamil;
- verify the provider's current licensing and service terms;
- retain editable source text and a reproducible generation manifest; and
- regenerate audio whenever the visible Tamil text changes.

## References

- [Microsoft Azure Speech language and voice support](https://learn.microsoft.com/azure/ai-services/speech-service/language-support?tabs=stt-tts)
- [Microsoft Azure Speech text-to-speech overview](https://learn.microsoft.com/azure/ai-services/speech-service/text-to-speech)
- [MDN: SpeechSynthesis.getVoices()](https://developer.mozilla.org/docs/Web/API/SpeechSynthesis/getVoices)
- [edge-tts project](https://github.com/rany2/edge-tts)

---

**Project outcome:** a Singapore Tamil SLS interactive that carries its own verified `ta-SG` speech assets and remains usable even when the learner's browser has no Tamil voice.
