In just a few seconds, you can create a copy of a voice—a new simplified model for speech synthesis has emerged, capable
In just a few seconds, you can create a copy of a voice—a new simplified model for speech synthesis has emerged, capable of reproducing someone else's voice from a short audio clip. It's all very simple: just provide the neural network with a few seconds of a person's recording, and it can speak any text in that same voice. The result sounds quite natural: the quality reaches 48 kHz, which is comparable to a regular audio recording. The most astonishing part is the speed. The model generates speech 150 times faster than real-time playback. In simpler terms, a minute-long text will be voiced in a fraction of a second. At the same time, the artificial intelligence requires less than 1 GB of video memory, so it can be run locally even on a standard PC or laptop. You can download it here (https://github.com/ysharma3501/LuxTTS).