Field note
A week with a transcription pipeline that almost worked
Notes from our first real attempt to fold automatic transcription into a regional desk — including the three workflows we had to throw out.
We spent a week trying to make automatic transcription a load-bearing part of a regional reporting workflow. By Friday, the pipeline was running. It was also producing fewer publishable minutes per reporter than the manual baseline. Both things are true.
The setup: five reporters, three languages, a shared upload folder, and a small open-weights transcription model running on a workstation in the office. Outputs landed in a draft document with speaker labels and timestamps. Reporters edited the draft, then sent it to a section editor.
What worked: long interviews in clear audio. The model handled an hour-long sit-down with a regional official faster than any of our reporters could, and the edit-pass took roughly twenty minutes.
What did not work: anything in a market, on a motorbike, or with overlapping speakers. The model produced confident, fluent text that was wrong in subtle ways — substituting plausible names, smoothing out the texture of a quote, occasionally inventing a sentence that filled an audio gap.
We threw out three workflows over the course of the week. The one that survived: transcription is allowed only for one-on-one interviews in controlled audio, the transcript is always verified against the recording before any quote is published, and the reporter — not the editor — is responsible for the verification.
Related articles