hub / github.com/souzatharsis/podcastfy

github.com/souzatharsis/podcastfy @v0.4.0 sqlite

repository ↗ · DeepWiki ↗ · release v0.4.0 ↗

191 symbols 685 edges 28 files 150 documented · 79%

README

Podcastfy.ai 🎙️🤖

An Open Source API alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI

https://github.com/user-attachments/assets/f1559e70-9cf9-4576-b48b-87e7dad1dd0b

GitHub Repo stars

Podcastfy is an open-source Python package that transforms multi-modal content (text, images) into engaging, multi-lingual audio conversations using GenAI. Input content includes websites, PDFs, images, YouTube videos, as well as user provided topics.

Unlike closed-source UI-based tools focused primarily on research synthesis (e.g. NotebookLM ❤️), Podcastfy focuses on open source, programmatic and bespoke generation of engaging, conversational content from a multitude of multi-modal sources, enabling customization and scale.

Audio Examples 🔊

This sample collection was generated using this Python Notebook.

Images

Audio	Description	Image Set
	Senecio, 1922 (Paul Klee) and Connection of Civilizations (2017) by Gheorghe Virtosu
	The Great Wave off Kanagawa, 1831 (Hokusai) and Takiyasha the Witch and the Skeleton Spectre, c. 1844 (Kuniyoshi)
	Pop culture icon Taylor Swift and Mona Lisa, 1503 (Leonardo da Vinci)

Text

Audio	Description	Content Type	Source
	Person Website	Website	Website
Audio	Lex Fridman Podcast: Dario Amodei Anthropic's CEO	Youtube	Youtube
Audio	Benjamin Franklin's Autobiography	Youtube	Book

Multi-Lingual Text

Language	Content Type	Description	Audio	Source
French	Website	Agroclimate research information	Audio	Website
Portuguese-BR	News Article	Election polls in São Paulo	Audio	Website

Features ✨

Generate conversational content from multiple sources and formats (images, text, websites, YouTube, and PDFs).
Generate shorts (2-5 minutes) or longform (30+ minutes) podcasts.
Customize transcript and audio generation (e.g., style, language, structure).
Generate transcripts using 100+ LLM models (OpenAI, Anthropic, Google etc).
Leverage local LLMs for transcript generation for increased privacy and control.
Integrate with advanced text-to-speech models (OpenAI, Google, ElevenLabs, and Microsoft Edge).
Provide multi-language support for global content creation.
Integrate seamlessly with CLI and Python packages for automated workflows.

Built with Podcastfy 🚀

Updates 🚀🚀

v0.4.0+ release

Released new Multi-Speaker TTS model (is it the one NotebookLM uses?!?)
Generate short or longform podcasts
Generate podcasts from input topic using grounded real-time web search
Integrate with 100+ LLM models (OpenAI, Anthropic, Google etc) for transcript generation

See CHANGELOG for more details.

Quickstart 💻

Prerequisites

Python 3.11 or higher
$ pip install ffmpeg (for audio processing)

Setup

Install from PyPI $ pip install podcastfy
Set up your API keys

Python

from podcastfy.client import generate_podcast

audio_file = generate_podcast(urls=["<url1>", "<url2>"])

CLI

python -m podcastfy.client --url <url1> --url <url2>

Usage 💻

Experience Podcastfy with our HuggingFace 🤗 Spaces app. (Note: This UI app is less extensively tested than the Python package.)

Customization 🔧

Podcastfy offers a range of customization options to tailor your AI-generated podcasts: - Customize podcast conversation (e.g. format, style, voices) - Choose to run Local LLMs (156+ HuggingFace models) - Set System Settings (e.g. output directory settings)

License

This software is licensed under Apache 2.0. See instructions if you would like to use podcastfy in your software.

Contributing 🤝

We welcome contributions! See Guidelines for more details.

Example Use Cases 🎧🎶

Content Creators can use Podcastfy to convert blog posts, articles, or multimedia content into podcast-style audio, enabling them to reach broader audiences. By transforming content into an audio format, creators can cater to users who prefer listening over reading.
Educators can transform lecture notes, presentations, and visual materials into audio conversations, making educational content more accessible to students with different learning preferences. This is particularly beneficial for students with visual impairments or those who have difficulty processing written information.
Researchers can convert research papers, visual data, and technical content into conversational audio. This makes it easier for a wider audience, including those with disabilities, to consume and understand complex scientific information. Researchers can also create audio summaries of their work to enhance accessibility.
Accessibility Advocates can use Podcastfy to promote digital accessibility by providing a tool that converts multimodal content into auditory formats. This helps individuals with visual impairments, dyslexia, or other disabilities that make it challenging to consume written or visual content.

Contributors

<a href="#readme-top" style="text-decoration: none; color: #007bff; font-weight: bold;">
    ↑ Back to Top ↑
</a>

Core symbols most depended-on inside this repo

get

called by 112

podcastfy/utils/config.py

podcastfy/utils/config.py

convert_to_speech

called by 9

podcastfy/text_to_speech.py

load_conversation_config

called by 8

podcastfy/utils/config_conversation.py

generate_qa_content

called by 7

podcastfy/content_generator.py

podcastfy/utils/config_conversation.py

Shape

Method 108

Function 57

Class 25

Route 1

Languages

Python100%

Modules by API surface

podcastfy/content_generator.py36 symbols

tests/test_generate_podcast.py21 symbols

tests/test_client.py17 symbols

podcastfy/utils/config_conversation.py14 symbols

podcastfy/text_to_speech.py10 symbols

tests/test_genai_podcast.py9 symbols

podcastfy/utils/config.py8 symbols

podcastfy/tts/providers/geminimulti.py8 symbols

tests/test_content_parser.py7 symbols

tests/test_audio.py7 symbols

podcastfy/content_parser/website_extractor.py7 symbols

podcastfy/tts/base.py6 symbols

Dependencies from manifests, versioned

aiohappyeyeballs2.4.3 · 1×

aiohttp3.10.10 · 1×

aiosignal1.3.1 · 1×

alabaster1.0.0 · 1×

annotated-types0.7.0 · 1×

anyio4.6.2.post1 · 1×

attrs24.2.0 · 1×

babel2.16.0 · 1×

beautifulsoup44.12.3 · 1×

bleach6.2.0 · 1×

cachetools5.5.0 · 1×

certifi2024.8.30 · 1×

For agents

$ claude mcp add podcastfy \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact