In this paper, we introduce ConversaSynth, a framework designed to
generate synthetic conversation audio using large language models
(LLMs) with multiple persona settings. The framework first creates
diverse and coherent text-based dialogues across various topics,
which are then converted into audio using text-to-speech (TTS)
systems. Our experiments demonstrate that ConversaSynth
effectively generates high-quality synthetic audio datasets, which
can significantly enhance the training and evaluation of models
for audio tagging, audio classification, and multi-speaker speech
recognition. The results indicate that the synthetic datasets
generated by ConversaSynth exhibit substantial diversity and
realism, making them suitable for developing robust, adaptable
audio-based AI systems.
(
Link to Project Page)