How We Optimized AI Business Plan Generation: Speed vs. Quality Trade-offs

3 min read6 days ago

Introduction: Why Did It Take 30 Minutes?

Our AI-powered business plan generator was initially designed to maximize content quality by executing multiple iterative processes, including drafting, evaluation, improvement cycles, and structured validation. The goal was to produce well-structured, coherent, and insightful business plans that aligned with our key characteristics:

Clarity and Simplicity — Ensuring business owners could easily understand and use their plans.
Strategic Guidance — Providing actionable insights, not just text.
Customization and Context Awareness — Tailoring plans based on user responses.
Actionability — Making plans easy to execute for entrepreneurs.

However, this high-quality multi-step approach came at a cost: generating a full plan took over 30 minutes. This delay made real-time interaction impractical. We needed to restructure our approach to prioritize speed without completely sacrificing quality.

After extensive performance analysis, we identified key bottlenecks and optimized our pipeline, ultimately reducing AI response time from 30 minutes to under 1 minute. However, this came with trade-offs: we had to sacrifice multiple execution cycles in favor of speed.

The Root Causes of Slowness

After extensive testing, we pinpointed the following factors slowing down AI business plan generation:

1. OpenAI Assistants API Overhead

Thread & Run Creation Latency: OpenAI’s Assistants API introduced significant delays while setting up and managing thread-based runs.
Parallel Execution Limitations: Running multiple sections in parallel within the same thread was not possible, forcing serialized processing.

2. Tool Calling Delays

Extra Messages in Function Calls: Every function invocation added an additional unstructured message, increasing response times unnecessarily.
Inefficient Call Structure: Tool calling created additional response handling overhead, making structured responses a more efficient alternative.

3. Multi-Step Execution Complexity

Multiple Iterations Per Section: The original workflow included drafting, evaluation, improvements, and re-evaluations before producing a final section.
High API Call Volume: The iterative nature of our initial process required multiple API calls per section, multiplying latency issues.

Key Optimizations We Implemented

1. Switching from OpenAI Assistants API to Chat API

Structured Responses Instead of Tool Calls: Instead of relying on tool calling, we transitioned to structured responses for generating business plan sections.
Immediate Response Streaming: The Chat API enabled faster streaming of responses, reducing the time to first token and improving interactivity.

2. Reducing Execution Cycles for Faster Generation

Eliminated Multi-Step Iterations: We removed post-evaluation refinement cycles, opting for a one-pass optimized generation approach.
Reduced Re-Evaluation Steps: Previously, sections were evaluated multiple times for improvements. We consolidated this into a single evaluation step.

3. Optimized LangChain Integration to Reduce Redundant API Calls

Default Polling Behavior: LangChain’s default Assistant with Tools implementation used polling to check whether a run was completed, leading to duplicate requests.
Modified Polling Strategy: We optimized our integration to reduce unnecessary polling, cutting down on redundant API requests.

Final Results: Speed vs. Quality Trade-offs

By implementing these optimizations, we reduced AI response time from 30 minutes to under 1 minute. However, this came at a cost:

What We Gained:

Drastic speed improvements, making real-time interaction feasible.
More flexible model switching, allowing seamless transitions between GPT-4o, GPT-4o-mini, and Claude models.
Better LangChain compatibility, ensuring future improvements can be integrated smoothly.

What We Lost:

Reduced self-improvement cycles, meaning the AI no longer iterates multiple times to refine responses.
Fewer evaluation layers, potentially lowering content depth in certain sections.

While this trade-off was necessary to enhance user experience, future updates may reintroduce selective iterative processes where speed allows.

Lessons Learned

1. LangChain & LangGraph Require Deep Customization

Out-of-the-box solutions introduce inefficiencies that require custom extensions.
Understanding and modifying LangChain’s internals was crucial for optimizing performance.

2. Speed vs. Quality Is a Balancing Act

Reducing generation steps improved speed but required sacrificing iterative refinements.
Future work may explore selective re-introduction of key quality-enhancing cycles.

3. Optimize for Speed First, Then Iterate on Quality

Initial performance issues made real-time usage impractical.
Prioritizing execution speed first allowed us to later refine output quality without impacting usability.

Try Our AI-Powered Business Suite

We built and optimized our AI-driven business plan generator at DreamHost, ensuring enterprise-level performance and scalability.

Try our AI-powered business planner and explore other business tools: Business Planner

Connect with me on LinkedIn: Krzysztof Miaskowski