Further Exploring Summarization

I left off the summarization piece with a few outstanding questions:

How best to optimize the chunk size in the text_splitter for speed
Is GPT4 worth the increased cost and time?
Why aren’t the bullets coming through in GPT4?

I want to explore some of these in a bit more detail starting with how best to optimize the text_splitter for either the 16k context window in GPT-3.5 or the 8k context window in GPT4. To start we’ll investigate how many tokens each chunk in the text splitter contains.

We can get this by asking our LLM to count the number of tokens:

num_tokens = chat_model.get_num_tokens(split_meeting[0].page_content)
print (f"Our prompt has {num_tokens} tokens")

If we increase our chunk_size in the text_splitter to 10,000 characters we get 4021 tokens per chunk. This fits within the GPT4 context window and by increasing the chunk size reduce the amount of time it takes for the summarization in GPT4. The total time it takes to summarize one meeting with GPT4 in this way is around 3.5 minutes.

I asked ChatGPT to help me come up with a crude formula to estimate the cost of the summarization based on the amount of tokens required.

# Let's assume these are the rates for each token for GPT-4, this should be updated with the actual rate.
TOKEN_COST = 0.06 / 1000  # Cost per token, e.g., $0.06 per 1,000 tokens

def calculate_cost(num_tokens, token_cost):
    """Calculate the cost for a given number of tokens."""
    return num_tokens * token_cost

# Assuming you have a function to get the number of tokens for the summary
def get_summary_token_count(text):
    # You would replace this with the actual call to OpenAI API or a function that estimates the summary token count
    return len(text) / 4  # Simplified, as 1 token is approximately 4 bytes of text

# Calculate the total tokens for the summarization of individual chunks
total_chunk_tokens = sum([chat_model.get_num_tokens(chunk.page_content) + get_summary_token_count(chunk.page_content) for chunk in split_meeting])

# Calculate the total tokens for the final combination
summaries = [summary for chunk in split_meeting for summary in chunk.page_content]  # Replace with actual summaries
combined_input_tokens = sum([get_summary_token_count(summary) for summary in summaries])
combined_output_tokens = get_summary_token_count(output)  # Replace with the actual combined summary
total_combined_tokens = combined_input_tokens + combined_output_tokens

# Calculate costs
chunk_summarization_cost = calculate_cost(total_chunk_tokens, TOKEN_COST)
final_combination_cost = calculate_cost(total_combined_tokens, TOKEN_COST)
total_cost = chunk_summarization_cost + final_combination_cost

# Print the costs
print(f"Chunk Summarization Cost: ${chunk_summarization_cost:.2f}")
print(f"Final Combination Cost: ${final_combination_cost:.2f}")
print(f"Total Cost for Meeting Summarization: ${total_cost:.2f}")

This came out to somewhere around $8 per meeting. But as I’m writing this, OpenAI just announced on 11/6/23 their new GPT4 turbo on 11/ model that has a context window of 128k, is quicker, and has a lower cost.

I did a quick trial and was able to pass the entire meeting summary in one API call negating the need for the map-reduce approach.

Since I can now pass everything into GPT4 without relying on LangChain’s map-reduce summarization technique, I’ll want to create a new prompt template that will take each individual summary with my custom instructions.

And while I won’t be using mapr-reduce and summarization chains in the final QC Stacks City Council meeting summary I’m still glad I learned about it. There will undoubtedly be larger summarization tasks in the future that won’t fit within the 128k context window and will require some sort of summarization approach.

This AI stuff moves fast!