I found it increasingly hard to learn anything unless there’s something on the line. That’s usually something that I need to get done for work or home life and requires me to pick up a new skill to accomplish the task. When I tried to learn things without having a project in mind I often will spin my wheels and watch Youtube videos on the subject or complete a few half-baked tutorials. I’ll look up in three months only to find that I have just a cursory knowledge of the field I’m trying to learn and won’t have anything to show for it.
I’ve also been struggling to find what to put on this blog and have read numerous recommendations about learning in the open and sharing what I’ve learned to help others on a similar path. So I’m going to attempt to do this while learning the skills of AI Engineering.
I have English and Journalism degrees and have always been interested in how computers can assist journalists in automating the boring parts of the job. I covered City Council meetings as an intern at a local newspaper and found that while there are important topics discussed in these, a journalist could find much better use of their time than transcribing and summarizing these meetings.
The Cincinnati City Council puts their meetings online at Archive.org and includes a closed captioned transcript of each meeting. The formatting of these minutes is brutal and painful to read. I’d like to get a summary of each meeting and have the most important parts detailed so I can understand what was discussed. I’d also like an audio file of the summary as a podcast so I can listen to the short snippet if I want.
Thankfully a large language model doesn’t care too much about the brutal formatting. I’m going to use ChatGPT with GPT4 to help me architect, program, and build this project. I plan to use this blog at Spellbooks.ai to document my journey. Since I don’t have any readers yet, this will mainly be for me at a later date and to help me process what I’m creating.
Let’s get started.
I knew I wanted to use ChatGPT from the beginning as a sounding board to help me create the initial product requirements and dependencies and application architecture. It helped me break down the entire project into four distinct modules that I can tackle independent of each other.
ChatGPT and I came up with the following MVP product architecture that I will use throughout this project:
### 1. Data Retrieval Module:
* **Objective:** To fetch City Council meeting minutes as TXT files from Archive.org and make them readable.
* **Tasks:**
* Write a script to download TXT files from Archive.org.
* Apply basic text transformations to remove unnecessary characters and line breaks.
### 2. LLM Processing Module:
* **Objective:** To format the text to resemble interview transcripts and summarize key highlights.
* **Tasks:**
* Develop a process to format the text into an interview-like structure with timestamps.
* Implement summarization to extract 2-3 sentences per highlight.
* Optionally, create anchor tags for each speaker line in the transcript.
### 3. Database:
* **Objective:** To store the processed minutes, highlights, and basic metadata.
* **Tasks:**
* Choose an open-source database (e.g., PostgreSQL) that fits your future expansion needs.
* Design a simple database schema to store minutes, highlights, attendees, meeting date, runtime, and location.
### 4. Frontend:
* **Objective:** To present the data in a clean, modern, newspaper-esque design.
* **Tasks:**
* Create wireframes or mockups for the main page displaying the meeting highlights and transcripts.
* Develop the frontend using SvelteKit, adhering to the desired typography and design.
* Implement accessibility features, ensuring that the site is usable by a wide audience.
### Optional Steps:
* **Automation & Scheduling:** Implement a manual or simple automated process to check for new data weekly.
* **Basic Monitoring:** Utilize existing tools or platforms to monitor performance and traffic.
### Testing & Validation:
* Test each module separately to ensure they work as intended.
* Perform integration testing to ensure all parts work together cohesively.
* Validate the MVP with a small group of users to gather initial feedback.
### Deployment & Hosting:
* Choose a hosting platform that aligns with your budget and technical requirements.
* Deploy the application and set up any necessary monitoring or automation tools.
### Future Iterations:
* Consider building a roadmap for future enhancements like notification services, community engagement, scalability, and additional content types.
* Prioritize features based on user feedback and the project's evolution.
And while this isn’t perfect, it’s much more organized than I would be if I were to try this without the help of a large language model.
I took the above and attempted to visualize the steps and modules below that I can use as a reference.