Now that the meetings are being fetched through a simple Python script, we can begin the work of summarizing the meetings with OpenAI’s GPT-4 model.
I’ve been reading a lot about LangChain, a popular open source toolkit that abstracts a number of common AI engineering tasks involved with developing with LLMs. I’m going to be using the summarization features of LangChain, specifically the map-reduce
approach.
Each large language model has a context window that is represented by a number of tokens. A token can be thought of as a representation of a certain number of words or characters. Each token is associated with an embedding, which is a vector that represents the token in a multidimensional space. I find it helpful to ignore the complexity here and think of the tokens linked to a list of numbers that help the model relate how the token relates to others.
All of this means is there is a limited amount of tokens an LLM can use during its inference phase that are provided by the user. The current limit for a model like GPT-4 is 8,000 and this refers to entire prompt and response. Most of the city council meetings are beyond this length so we’ll want to chunk them up into separate parts using LangChain as well.
We’ll start by doing this for one meeting and then working with ChatGPT as a pair programmer to abstract it for all meetings in some sort of loop.
We’ll first import the DirectoryLoader for the transcripts and RecursiveTextSplitter to split up the meeting into 1000 character chunks that we can use to pass through to the LLM using the map-reduce summarization chain.
from langchain.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Load the meeting into a loader to split for summarization
loader = DirectoryLoader('./local_files/test', glob='**/*.txt')
meeting = loader.load()
# Chunk it up
text_splitter = RecursiveCharacterTextSplitter(
chunk_size = 3000,
chunk_overlap = 20,
length_function = len,
is_separator_regex = False,
)
split_meeting = text_splitter.split_documents(meeting)
The split_meeting
variable contains a list of chunks that resemble:
[Document(page_content="[000:00:04;971] >> Mayor Pureval: GOOD\n\n[000:00:05;572] AFTERNOON, WELCOME TO TODAY'S\n\n[000:00:07;107] CITIZENS FORUM, YOU WILL HAVE\n\n[000:00:08;208] TWO MINUTES TO ADDRESS COUNCIL\n\n[000:00:09;609] AS I CALL YOUR NAME.\n\n[000:00:15;582] OUR MAYOR VICE MAYOR KEARNEY\n\n[000:00:16;316] WILL BE PARTICIPATING HAVE I VIA\n\n[000:00:18;685] ZOOM DA, OUR FIRST SPEAKER IS\n\n[000:00:21;654] KELLY PRATHER.\n\n[000:00:22;689] WELCOME YOU HAVE TWO MINUTES.\n\n[000:00:35;268] >> MAYOR AND MEMBERS OF COUNCIL,\n\n[000:00:36;669] THANK YOU SO MUCH FOR ALLOWING\n\n[000:00:38;371] ME TO SPEAK.\n\n[000:00:40;640] THE ISSUES I'M GOING TO SHARE\n\n[000:00:42;909] TODAY IS CONFIRMATION THAT WE\n\n[000:00:45;145] NEED TO PUSH FOR WASHINGTON TO\n\n[000:00:49;549] GET FEDERALLY LEGISLATED POLICE\n\n[000:00:53;887] REFORM.\n\n[000:00:54;320] WE NEED TO ADVOCATE FOR THE\n\n[000:00:57;257] GEORGE FLOYD JUSTICE IN BLISSING\n\n[000:00:58;792] ACT BECAUSE THE CINCINNATI\n\n[000:01:00;660] POLICE DEPARTMENT CONTINUES TO", metadata={'source': 'local_files/test/test.txt'}), Document(page_content="[000:01:02;328] HARASS CITIZENS.\n\n[000:01:03;029] YOU ALL KNOW THAT WHEN I WAS\n\n[000:01:04;364] BEFORE YOU BEFORE I TALKED ABOUT\n\n[000:01:06;199] SELECTED ENFORCEMENT.\n\n[000:01:07;167] I ALSO TALKED ABOUT SELECTIVE\n\n[000:01:10;103] PROSECUTION AND SELECTIVE\n\n[000:01:12;172] PROTECTION SADLY.\n\n[000:01:14;074] TAXPAYERS ARE GUARANTEED THE\n\n[000:01:16;242] RIGHT OF PROTECTION IN THEIR\n\n[000:01:17;710] AREAS.\n\n[000:01:18;044] WHEN YOUR FAMILY IS TARGETED,\n\n[000:01:20;013] THAT DOES NOT HAPPEN.\n\n[000:01:20;914] BECAUSE I HAVE BEEN ON SO FOCAL\n\n[000:01:23;983] ABOUT ALL THESE THINGS, TWO MORE\n\n[000:01:25;452] OF MY FAMILY MEMBERS HAVE BEEN\n\n[000:01:28;154] ARRESTED THE SWAT TEAM RAIDED A\n\n[000:01:30;223] HOME AT 5:00 O'CLOCK IN THE\n\n[000:01:31;591] MORNING AND ARRESTED MY NIECE\n\n[000:01:34;027] AND NEPHEW AND ANOTHER NEPHEW.\n\n[000:01:36;429] I ASKED COUNCILMEMBER JOHNSON\n\n[000:01:38;098] WHAT I SHOULD DO.\n\n[000:01:38;865] DO I FILE A INVESTIGATION WITH", metadata={'source': 'local_files/test/test.txt'}), Document(page_content="[000:01:43;136] THE OFFICE OF INTERN AFFAIRS.\n\n[000:01:47;440] I REACHED OUT TO CHARMAINE\n\n[000:01:51;978] MCGUFFEY, CHAR MAIN -- STEPHANIE\n\n[000:01:54;681] SUM RECALL DUMAS AND A LETTER TO\n\n[000:01:59;719] MAYOR GARLAND AT THIS POINT.\n\n[000:02:00;887] I'M ASKING FOR PROTECTION AS FAR\n\n[000:02:02;555] AS THE CORRUPTION AND COLLUSION\n\n[000:02:03;923] AND DEVELOPMENT WHAT PEOPLE ARE\n\n[000:02:05;358] DOING BEHIND THE SCENES, I DON'T\n\n[000:02:06;659] CARE ABOUT THAT.\n\n[000:02:08;194] YOU KNOW?\n\n[000:02:08;695] THEY HAVE TO ANSWER FOR WHAT\n\n[000:02:10;663] THEY DORKS AT THIS POINT IT IS\n\n[000:02:11;431] ABOUT PROTECTING MY FAMILY.\n\n[000:02:13;066] WHATEVER CAN YOU DO TO ASSIST US\n\n[000:02:14;567] I WOULD GREATLY APPRECIATE IT\n\n[000:02:16;603] BECAUSE, YOU KNOW, I DON'T WANT\n\n[000:02:18;471] TO BE -- MY FAMILY TO BE\n\n[000:02:20;807] HARASSED BECAUSE I AM WHO I AM\n\n[000:02:22;342] AND I'M GOING TO SPEAK OUT\n\n[000:02:24;477] AGAINST WHAT AFFECTS THE PEOPLE.", metadata={'source': 'local_files/test/test.txt'}), Document(page_content="[000:02:26;513] SO, IF YOU ARE WILLING TO SIT\n\n[000:02:28;281] DOWN AND TALK TOE TOME AND SEE\n\n[000:02:30;183] HOW WE CAN COLLABORATE AND MAKE\n\n[000:02:31;084] SURE MY FAMILY IS SAFE, I WOULD\n\n[000:02:32;819] GREATLY APPRECIATE IT.\n\n[000:02:33;920] BUT NO ONE ELSE NEEDS TO BE PUT\n\n[000:02:35;588] IN HARM'S WAY.\n\n[000:02:36;923] WE TALKED ABOUT THERE BEING A\n\n[000:02:38;791] LEAK IN CRIME STOPPERS AND THAT\n\n[000:02:39;926] CONTINUES TO HAPPEN.\n\n[000:02:41;728] I WOULD APPRECIATE IT.\n\n[000:02:44;230] >> Mayor Pureval: THANK YOU VERY\n\n[000:02:45;064] MUCH.\n\n[000:02:45;365] NEXT SPEAKER IS DOUG SPRINGS,\n\n[000:02:47;133] HAVE YOU TWO MINUTES.\n\n[000:03:03;883] >> [INDISCERNIBLE] I'M HERE TO\n\n[000:03:05;118] ASK FOR A MEETING WITH THE CHIEF\n\n[000:03:07;120] OF POLICE AND THE CITY MANAGER.\n\n[000:03:12;325] MY ISSUES ARE MY -- I'M\n\n[000:03:14;961] CONCERNED ABOUT MY WELL BEING AS\n\n[000:03:17;263] A CITIZEN.\n\n[000:03:19;332] MY ISSUE ISN'T COVERED BY THE", metadata={'source': 'local_files/test/test.txt'}), Document(page_content="[000:03:22;368] COLLABORATIVE AGREEMENT.\n\n[000:03:23;736] I ASKED THE CITY MANAGER I ASKED\n\n[000:03:27;740] THE POLICE CHIEF TO MEET WITH\n\n[000:03:29;842] ME, THEY WON'T MEET WITH ME.\n\n[000:03:32;011] I'M BEING HARASSED ON A CONSTANT\n\n[000:03:34;414] BASIS BY THE CINCINNATI POLICE\n\n[000:03:37;917] UNDERCOVER AND I KNOW WHAT IT IS\n\n[000:03:40;053] ABOUT.\n\n[000:03:40;386] OTHER PEOPLE SIT HERE AND KNOW\n\n[000:03:41;988] WHAT IT IS ABOUT.\n\n[000:03:45;992] IF SOMETHING HAPPENS TO ME, I\n\n[000:03:48;628] WANT IT TO BE KNOWN THAT I CAME\n\n[000:03:51;030] TO CITY COUNCIL TO PLEAD WITH\n\n[000:03:53;266] CITY COUNCIL TO SAVE MY LIFE\n\n[000:03:56;169] BECAUSE IT IS THAT SERIOUS.\n\n[000:03:58;171] BUT NO ONE WANTS TO LISTEN\n\n[000:04:00;940] BECAUSE BLACK LIVES DOESN'T\n\n[000:04:05;178] MATTER.\n\n[000:04:07;347] >> Mayor Pureval: THANK YOU,\n\n[000:04:08;414] SIR.\n\n[000:04:08;715] OUR NEXT SPEAKER IS WILLIAM\n\n[000:04:11;351] JACKSON, YOU HAVE TWO MINUTES.\n\n[000:04:15;655] >> YES, I AGREE WITH MR. POOLE,", metadata={'source': 'local_files/test/test.txt'}),
We can finally bring in our LLM from OpenAI using LangChain.
import os
from langchain.chat_models import ChatOpenAI
os.environ['OPENAI_API_KEY'] = 'sk-...'
OPENAI_API_KEY = os.environ['OPENAI_API_KEY']
# Initialize the OpenAI Chat model
chat_model = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model='gpt-3.5-turbo-16k')
And now we can bring in the necessary prompt templates and chains from LangChain
from langchain.chains.llm import LLMChain
from langchain.prompts import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
We can call the load_summarize_chain
and pass in the map-reduce
strategy like so:
summary_chain = load_summarize_chain(llm=chat_model, chain_type='map_reduce')
output = summary_chain.run(split_meeting)
print(output)
But things can get even more powerful if we customize both the map and combine (or reduce) templates that we pass into the load_summarize_chain
# Customize the map_prompt
map_prompt = """
Given the following excerpt from a City Council meeting:
"{text}"
Please provide a brief (around 100 words) and concise summary that captures the main points and decisions made. Ensure any jargon or technical terms are explained or simplified.
BRIEF SUMMARY:
"""
map_prompt_template = PromptTemplate(template=map_prompt, input_variables=["text"])
combine_prompt = """
You have a set of brief summaries from a City Council meeting. Each summary represents a different segment of the meeting:
{text}
Please combine these summaries into a coherent and engaging narrative that captures the overall themes, key decisions, and important discussions of the meeting. Place extra emphasis on actions taken during the meeting, especially those involving money, budget, and anything that might affect citizens' lives. The final summary should read like a well-written news article.
As part of the narrative, identify and discuss the top 5 most significant topics from the meeting. Also, incorporate a discussion of the 5 most actionable items from the meeting that readers can help with or hold city council accountable to.
FINAL SUMMARY AND TOPICS:
"""
combine_prompt_template = PromptTemplate(template=combine_prompt, input_variables=["text"])
I worked with Phind as a pair programmer to help me debug a few things in my Jupyter Notebook as its better than ChatGPT when looking for recent information. I also used it to help modify the map and combine prompts. I’m pretty happy with what we have.
We now have a full script using LangChain that can summarize individual council meetings. We’re still a long ways until we can get something in production, but we’re getting closer. Here’s a preview of a summarized city council meeting for reference. It took about 3 minutes to process the entire meeting using the gpt-3.5-turbo-16k
model.
During a recent City Council meeting in Cincinnati, a wide range of topics were discussed, reflecting the concerns and priorities of the community. The meeting covered issues related to police reform, criminal activity, lack of council action, reparations, and appreciation for the heroic efforts of the Cincinnati Fire Department.
The meeting began with citizens expressing their need for police reform and justice, specifically mentioning the George Floyd Justice in Policing Act. Kelly Prather, a concerned citizen, raised issues of selective enforcement, prosecution, and protection by the Cincinnati Police Department, citing the recent arrest of two of her family members. Prather requested the council's assistance in addressing corruption and collusion within the department, emphasizing the need for protection for her family. Similarly, two other citizens expressed concerns about their well-being and requested meetings with the Chief of Police and City Manager. One individual claimed constant harassment by undercover police officers, while the other suggested extending the speaking time limit during meetings.
Another significant topic discussed during the meeting was the inability to prosecute criminals and concerns about criminal activity in the city. One speaker shared their experience of receiving nasty calls and accused someone named Rave of serious offenses, although they did not make any formal accusations. Stanford Poole expressed frustration with the city's housing scheme that targeted poor Black people and accused the City Building Department of lying to the council. These concerns highlighted the need for improved law enforcement and transparency.
Citizens also raised concerns about the council's lack of action on issues related to crime, fines, and the building department. The resident accused the council of favoritism towards wealthy individuals and businesses while neglecting the needs of the community. They called for an FBI investigation and criticized the mayor for not fulfilling their campaign promises. Another speaker urged the council to investigate an explosion near a Procter and Gamble plant and encouraged residents of Winton Terrace to vote. These concerns highlighted the need for accountability and responsiveness from the council.
Reparations for the railways, built by slaves, was another significant topic discussed during the meeting. The speaker criticized the idea of selling the railways and urged the council to do right by the community. Miss Coffee requested help regarding housing issues and the establishment of a housing task force to enforce tenant protection laws and hold landlords accountable for predatory practices. These topics emphasized the importance of addressing historical injustices and ensuring affordable and fair housing for all residents.
The council meeting also recognized the heroic efforts of the Cincinnati Fire Department and neighboring fire departments in responding to a 7-alarm fire in the community. The council expressed gratitude for their dedication, professionalism, and mutual aid. They emphasized the importance of cooperation among communities and the need for continuous support and recognition of the fire department's service.
The top five most significant topics discussed during the meeting were:
1. Police reform, selective enforcement, and protection.
2. Inability to prosecute criminals and concerns about criminal activity.
3. Lack of council action on crime, fines, and building department issues.
4. Reparations for railways built by slaves and affordable housing concerns.
5. Recognition and appreciation for the heroic efforts of the Cincinnati Fire Department.
In terms of actionable items, readers can hold the City Council accountable for the following:
1. Engaging in discussions and advocating for police reform and the passage of the George Floyd Justice in Policing Act.
2. Supporting efforts to address corruption and collusion within the Cincinnati Police Department.
3. Holding the council accountable for addressing concerns about selective enforcement, prosecution, and protection.
4. Advocating for an FBI investigation into allegations of favoritism, lack of action, and campaign promises.
5. Supporting initiatives for reparations for railways built by slaves and advocating for affordable housing solutions, such as the establishment of a housing task force.
Overall, the City Council meeting in Cincinnati addressed a range of significant topics that reflect the concerns and priorities of the community. The meeting highlighted the need for police reform, accountability, and transparency, as well as the importance of addressing issues related to crime, fines, and housing. The recognition and appreciation for the heroic efforts of the fire department emphasized the importance of community cooperation and support for public services. Citizens can play an active role by engaging in discussions, advocating for change, and holding the council accountable for addressing these issues.
Out of curiosity I wanted to see how GPT4 would work with this chain. Here’s the output from it for comparison (it took NN minutes):
The City Council meeting in Cincinnati, led by Mayor Pureval, was a whirlwind of impassioned debate, thought-provoking discussion, and heartening recognition. The council tackled a broad spectrum of topics, with a particular emphasis on police reform, housing issues, fiscal responsibility, community appreciation, and public engagement.
One of the most prominent topics was police reform and citizens' safety. Vocal advocates, including speaker Kelly Prather, called for federal police reform legislation and raised concerns about potential corruption within the local government. William Jackson suggested extending public speaking time and improving the handling of digital evidence. Citizens can play a vital role in holding the council accountable for addressing these reform issues, ensuring safety, and improving transparency.
Housing was another key issue, with concerns raised about alleged predatory practices by landlords and the need for a housing task force to enforce Ohio tenant laws. The council's commitment to addressing these pressing issues can be ensured by citizens demanding greater transparency and accountability.
The city's fiscal management also came under scrutiny. Derrick Blassingame urged the council to be more financially responsible in addressing the city's budget deficit, and concerns were raised about the misappropriation of budgets and the sale of city assets. Here, citizens can demand more responsibility and transparency in the city's fiscal management, particularly in asset sales and budget allocation.
The council also took time to recognize the heroic efforts of the Cincinnati Fire Department in tackling a massive seven-alarm fire in Camp Washington. A resolution was passed commending the firefighters' dedication and service, and citizens can show their appreciation by contributing to fire safety education and prevention programs.
The meeting concluded with the council celebrating Women's History Month by honoring several local women for their contributions to the city, including Tina Stoeberl and Melissa Autry. The community can celebrate and support the achievements of these local women by encouraging more representation and recognition of women in the city's history and future.
The council approved several actionable items, notably a $26 million Water Supply Revolving Loan for replacing aging water mains and a $35 million loan for lead pipe replacement, ensuring the city's water infrastructure remains current and safe. They also approved a $1.698 million project for Findlay Recreation Area, which will require public vigilance for timely and quality completion. Lastly, the council discussed a strategic alignment report related to an opioid settlement, which will determine the allocation of settlement funds over the next 18 years - a matter that citizens must follow closely.
The meeting was a testament to the council's commitment to promoting a vibrant, inclusive, and sustainable Cincinnati. With a clear focus on community development, fiscal responsibility, and inclusivity, the council continues to shape the city's future through its decisions and actions.
Some things to think about next:
- How best to optimize the chunk size in the
text_splitter
for speed - Is GPT4 worth the increased cost and time?
- Why aren’t the bullets coming through in GPT4?