In an era where AI is everywhere, the real question is not whether you are using AI, but whether you are using it deliberately. At Vets Who Code, accessibility is not a feature we bolt on at the end. It is something we design for from the beginning. That mindset is what led me to build an automated audio overview system for our blog using Google Gemini and Cloudinary.
This was not about adding synthetic voice for novelty. It was about removing friction between knowledge and the people who need it. Written content is not accessible to everyone. Some people learn better by listening. Some are multitasking. Some rely on audio as an accessibility requirement. Manually recording audio does not scale, and outsourcing it introduces cost, latency, and inconsistency. If AI is going to be useful, it should eliminate those tradeoffs, not introduce new ones.
The Role of AI
At the center of this system is AI, but not in the way most people talk about it. The AI is not deciding what we publish. It is not rewriting content. It is not acting autonomously. It has a narrow, well-defined responsibility: translating validated written knowledge into natural, listenable audio.
Preprocessing
The workflow begins with our blog posts, which live as Markdown files in the repository. Markdown is great for writing, but it is not suitable for speech. Headings, links, formatting, and inline HTML all degrade audio quality if passed directly into a text-to-speech model. Before AI ever generates a voice, the system cleans and reshapes the content. Headers are removed, links are stripped, emphasis syntax is flattened, and only the meaningful narrative text remains.
This preprocessing step is critical. Large language models do not simply read text aloud. They infer structure, intent, and pacing from the input they are given. Clean input produces better rhythm, clearer pronunciation, and more natural emphasis. The quality of the audio is determined as much by the engineering around the AI as by the model itself.
Generation
Once the text is prepared, it is sent to Gemini 2.5 Flash, Google's low-latency neural text-to-speech model. Gemini generates raw PCM audio data.
Raw PCM is not practical for storage or delivery, so the system converts it to a 24kHz, mono, 16-bit WAV file—lossless, widely supported, and ideal for long-term storage. At this stage, the AI's work is complete.
Delivery
Cloudinary handles the transformation and delivery.
When a blog post is loaded requests MP3 output via a Cloudinary URL parameter. The first request triggers the transformation. Subsequent requests are served from the CDN cache. No manual conversion. No duplicate storage. No additional build steps.
WAV is ideal for preservation and future transformations. MP3 is ideal for browsers and bandwidth. By separating storage format from delivery format, the system remains flexible and scalable. If we ever want a different format or compression strategy, it is a URL change, not a pipeline rewrite.
From the user's perspective, this entire process is invisible. When someone visits a blog post, they see an audio player and press play. Behind the scenes, AI has translated text into speech, infrastructure has handled format conversion, and the CDN has optimized delivery.
Why This Matters
The AI is treated like infrastructure, not magic. Its role is clearly defined. Its output is traceable. Its impact is measurable. Humans still own the content. Engineers still own the system. AI simply removes the mechanical barrier between written knowledge and spoken access.
The most responsible use of AI is not full automation or creative replacement. It is augmentation that increases access, reduces friction, and preserves accountability. This audio pipeline does exactly that.
The result is a blog that can be read or listened to without extra effort from the team. Accessibility becomes a default outcome of the system, not an afterthought. And this very post can have its own audio overview generated by the same pipeline. The system explains itself, using itself.
That is what intentional AI usage looks like.
Support Vets Who Code
If this story resonates with you, consider supporting Vets Who Code to help more veterans transition into successful tech careers. Your donations can make a significant impact. You can also sponsor us on GitHub to get technical updates and support our mission. Together, we can make a difference.
