The new reality for teams is model routing. Fireworks is being used inside early agent workflows where model selection happens dynamically based on cost and performance. Two perspectives from our partners at Trilogy make this concrete: • Adoption → why they moved to Fireworks (cost pressure, rate limits, flexibility) • Usage → how it’s integrated into their systems (provider abstraction + orchestration like Open Symphony)
Fireworks AI
Software Development
San Mateo, CA 41,926 followers
Run AI faster, more efficiently, and on your own terms
About us
Fireworks is the fastest way to build, tune, and scale AI on open models. Ship production-ready AI in seconds on our globally distributed cloud infrastructure, optimized for your use case. Fireworks powers production workloads at companies like Uber, Doordash, Notion, and Cursor—delivering 15× faster speed, 4× lower latency, and 4× more concurrency than closed models.
- Website
-
https://fireworks.ai
External link for Fireworks AI
- Industry
- Software Development
- Company size
- 51-200 employees
- Headquarters
- San Mateo, CA
- Type
- Privately Held
- Founded
- 2022
- Specialties
- LLMs, Generative AI, artificial intelligence, developer tools, software engineering, and inference
Locations
-
Primary
Get directions
San Mateo, CA 94402, US
Employees at Fireworks AI
Updates
-
This is stepping stone for enabling customers to generate training data from traces and the lean into continuous post training and own their AI with their own data moat. Kudos to the LangChain for the incredible work. The continuation of this research will be essential for companies creating their specialized intelligence.
LangChain Labs teamed up with Fireworks AI to answer the question, “How can we cost-effectively mine important signals from every single trace, while maintaining frontier performance?” https://lnkd.in/gHu2PbfQ We fine-tuned a Qwen judge model to detect “Perceived Error” from user interactions, with experiments running around three primary questions: 1️⃣ Does fine-tuning improve baseline judge quality up to frontier model performance? 2️⃣ Does a learned judge transfer across datasets? 3️⃣ Is serving a fine-tuned model cost-effective? We found that our fine-tuned model exceeded frontier model performance and runs ~100x cheaper. Read our study to learn how we ran this experiment from data preparation to fine-tuning setup, and see how this will impact our future research on trace understanding.
-
We can't wait to see what people build. With this partnership, we're giving developers access to production-grade open model inference through a single Azure endpoint, with enterprise service-level agreements (SLAs) and zero-setup onboarding. This is available now. Check out Fireworks on Foundry at the link in the comments.
Microsoft Build 2026 delivered. Here's what it means if you're building an AI startup. From Fireworks AI going GA in Microsoft Foundry to a new family of MAI models, to smarter discovery in Microsoft Marketplace, the thread running through all of it is the same: it's getting easier to build trustworthy AI on a single platform and connect it to the enterprise customers who need it. Microsoft for Startups is also making it simpler to get started with Startup credits and grow your benefits as you build on Azure. Five announcements. One read. Worth your time before your next sprint. https://msft.it/6049vg49D
-
-
Kimi (Moonshot AI) released K2.7 Code, the latest in their K2 line of coding models, and it's live on Fireworks Day 0, on serverless and the API. K2.7 Code generates roughly 30% fewer reasoning tokens than K2.6 while improving results on coding evaluations. For teams running long-horizon coding agents, this reduces real cost per task. In multi-turn agent workflows, every reasoning token becomes context for the next steps. Cutting reasoning length leads to smaller contexts, faster loops, and fewer retries across the entire trajectory. K2.7 Code is available now with Standard and Priority serving tiers. A high-throughput Fast path is coming soon. Pricing: $0.95 / 1M input, $4.00 / 1M output, and $0.19 / 1M on cache hits. 256K context window. Full details: https://lnkd.in/ggadMGRz
-
Qwen 3.7 Plus is now live on Fireworks. The official Qwen 3.7 Plus weights are now hosted and served on Fireworks infrastructure. Your teams get: → Strong performance on long-horizon agent workflows with tool use and verification loops → Ability to preserve reasoning across multiple turns → Flexible thinking / non-thinking modes per request → Native multimodal input and prompt caching (80% cheaper cached tokens) → OpenAI and Anthropic-compatible APIs Serverless pricing: $0.50 / 1M input ($0.10 cached) and $3 / 1M output. Full details: https://lnkd.in/gHwmM5Mu
-
-
Fireworks AI reposted this
We're expanding Legal Agent Bench (LAB) to better evaluate how agents perform on one of the most common functions inside enterprises: contract negotiation. The update adds 500 new tasks spanning contract drafting, review, and negotiation across a wide range of agreement types and negotiation stages. Our goal is simple: measure whether agents can effectively advance a negotiation, recognize risk, and bring humans into the loop when the situation demands it. Read how we're benchmarking contract negotiation and the research directions we're pursuing next: https://lnkd.in/ggWxtsfk
-
-
MiniMax M3 is now on Fireworks. This open-weight frontier model combines three capabilities that have historically been expensive or fragmented: - Native multimodality (text + image + video) - Strong agentic and multi-turn coding performance - 512K token context All at roughly 1/20th the price of comparable closed-source models, with pricing now aligned to the previous M2.7 generation. For engineering and product teams, this changes the economics of building production-grade agents, long-document and repository-scale systems, and multimodal applications. You no longer have to choose between capability and cost at scale. Fireworks is providing Day-0 support with the fastest inference endpoints for the full MiniMax model family, including serverless for quick starts and on-demand deployments for production workloads.
-
-
Fireworks AI reposted this
Crazy last week! We went from the energy of Microsoft Build and our announcements in SF straight into NYC TECH WEEK by a16z! Kicked things off with an intimate dinner hosted alongside our friends at turbopuffer with great enterprise leaders, founder and deep conversations. Then closed out NYC Tech Week with a rooftop party alongside some amazing partners: Exa, turbopuffer, Composio, Intercom, Vanta. Summer officially feels like it's started. The city was electric, the Knicks pulled off the win that night, and the views were incredible. There's something special about building spaces where founders, engineers, and industry leaders can connect. Always grateful to the partners and communities that make it possible.
-
-
Fireworks Training Platform keeps expanding. Leading US open weight model Nemotron 3 Ultra is now ready for post-training: SFT and DPO via LoRA or full-parameter, on the same infrastructure that serves it. The model you train is the model you ship. Get started: http://fireworks.ai/train
-
-
We’re excited to share that Fireworks AI has been named to Redpoint’s InfraRed 100 list, which recognizes companies building the next generation of infrastructure and AI. We're proud of the inclusion, and more proud of what put us there: a team working through hard infrastructure problems so that companies can run AI in production without giving up control, quality, or cost efficiency. If that's the kind of work you want to do, we're hiring: https://lnkd.in/gWuP3nFb
-