Field note

A week with a transcription pipeline that almost worked

Notes from our first real attempt to fold automatic transcription into a regional desk — including the three workflows we had to throw out.

BandungBergerak regional deskNov 04, 20266 min read

Reporter's notebook beside a recording device on a wooden table.

We spent a week trying to make automatic transcription a load-bearing part of a regional reporting workflow. By Friday, the pipeline was running. It was also producing fewer publishable minutes per reporter than the manual baseline. Both things are true.

The setup: five reporters, three languages, a shared upload folder, and a small open-weights transcription model running on a workstation in the office. Outputs landed in a draft document with speaker labels and timestamps. Reporters edited the draft, then sent it to a section editor.

What worked: long interviews in clear audio. The model handled an hour-long sit-down with a regional official faster than any of our reporters could, and the edit-pass took roughly twenty minutes.

What did not work: anything in a market, on a motorbike, or with overlapping speakers. The model produced confident, fluent text that was wrong in subtle ways — substituting plausible names, smoothing out the texture of a quote, occasionally inventing a sentence that filled an audio gap.

We threw out three workflows over the course of the week. The one that survived: transcription is allowed only for one-on-one interviews in controlled audio, the transcript is always verified against the recording before any quote is published, and the reporter — not the editor — is responsible for the verification.

A photograph partially dissolving into pixels.

Explainer·Ethics

Attribution, consent, and synthetic media

Three questions every newsroom should answer before publishing AI-assisted imagery.

More on similar themes.

Attribution, consent, and synthetic media