From 12 Months to 3 Months: How Booking.com Cracked the Code on AI Experimentation
The Behind-the-Scenes from ProductLab Conf> Berlin
Hey there,
After ProductLab Conf>, we asked you which talks resonated most—and Pranav’s session on scaling AI experimentation came out as one of the absolute favorites. So many of you reached out asking for a deeper recap to share with your teams.
Here it is. For the community.
You know that moment when a speaker starts their talk and you can just feel the energy shift in the room? That happened when Pranav took the stage at ProductLab Conf> on September 18th.
The lights were bright, the ProductLab Stage was packed, and Pranav—who leads Product Development for AI at Booking.com—opened with a confession that made everyone lean in: “I can’t see much because of the lights, but wow, we have a full house!”
What followed was one of the most honest, actionable talks on AI experimentation I’ve seen. No fluff. No AI hype. Just real lessons from someone who’s been doing AI for 10 years (yes, before it was cool) and has the battle scars to prove it.
The Failure That Changed Everything
Let’s start with the uncomfortable truth Pranav shared: his first chatbot was a complete disaster.
Back in 2017, his team spent 12 months. They involved three separate teams. They built a “virtual post-booking customer service agent in training” using basic natural language processing (remember, no GenAI back then).
The result? Customer satisfaction didn’t move. At all.
Here’s what hit me: It’s not just that it failed. It’s that it took a YEAR to fail.
But here’s where the story gets interesting...
The Transformation: Tripling Their AI Success Rate
Fast forward to today. Booking.com’s AI experimentation success rate is now more than triple their standard product launch rate.
Let me repeat that: Their AI experiments succeed at 3x the rate of traditional features.
That’s not luck. That’s a system. And Pranav broke down exactly how they did it.
The Four Principles That Actually Work
1. Spot the Right Opportunity
Pranav shared their prioritization framework—and it’s beautifully simple:
Pain: How much does this problem hurt?
Reach: How many users does it affect?
Data: Do we have what we need?
Proof: Can we validate this works?
But here’s the nuance he added during Q&A: “It’s not just about the numbers. If something’s going to generate €20 million more for the business, I will take that over everything else.”
They’re doing 200 BILLION predictions per day at Booking. The scale is wild. But they still ask: What moves the metric the most?
2. Track the Right Metrics
This one really resonated with the PMs in the room. Pranav was crystal clear: don’t track vanity metrics.
At Booking, they don’t obsess over weekly active users. They track conversion. Why? Because:
It overlaps business and user value
You can isolate what caused it to move
You can continuously iterate on it
It’s actionable
“If you make a change to the product, you set up an A/B test, it moves up or it moves down. You’re able to isolate the cause of that movement quite easily.”
3. Democratize AI Across Your Organization
This might be the most underrated insight from the talk.
When Booking first asked teams “What do you want to build with AI?”, they got 1,000 submissions. Only 5 were viable.
Today? They get about 100 submissions per year. 90 of them are viable.
What changed? They trained their org. Not on transformer architectures or how LLMs work under the hood. They taught six practical skills that AI is really good at.
“If the AI skills in the organization are concentrated towards one team, no one else is ideating,” Pranav explained. And without ideas, you can’t scale.
4. Learn the Tools, Experiment, Ship Fast
Remember that first chatbot that took 12 months? Their latest version took 3 months with just two teams.
Same company. Same complexity. Different approach.
They now:
Centralize the tech that drives experimentation
Automate the iteration cycle (prompts, user feedback, model responses)
Make it easier for teams after their first success
“The first time that the team experiments is always much harder,” Pranav admitted. “But then the second and third and fourth iteration pick up quite naturally, because the team now knows where the success comes from.”
The Real-World Examples That Prove It Works
Pranav showed us three live products that exemplify this approach:
1. AI-Assisted Host Replies
Hotel partners can now auto-generate responses to guest messages. It’s been wildly successful—partners love it because it saves time, guests love it because they get faster replies.
2. Personalized Property Descriptions
Instead of showing the same generic hotel description to everyone, Booking now uses AI to tailor descriptions to user profiles. A family sees “five bedrooms, five bathrooms, ample space for families.” A solo traveler sees something completely different. Same property. Different story.
3. The Chatbot That Actually Works
Three months to build. Two teams. Good customer engagement. Good task completion. Good conversion rates.
It’s the redemption arc of that 2017 failure.
The Uncomfortable Questions (and Honest Answers)
The Q&A session was gold. Someone asked about hallucinations and inconsistency—the classic LLM problems.
Pranav’s response? They don’t lower temperature (that kills the magic users expect). Instead:
They use evaluation models (evals) and train their own
They demand structured outputs (JSON responses)
They send a percentage to human evaluators
They use deterministic models for critical paths (like property recommendations)
“We have very low tolerance for factual accuracy, actually, because big business, lots of risk.”
Another question was about scaling after experiments. How do you shorten cycle time?
The answer: centralize the tech. If five experiments all use question-answering, build it as a shared service. Iterate across all five together.
What We’re Taking Away
Here’s what this talk taught us about scaling AI experimentation:
Start with real problems, not AI solutions. Pranav emphasized: “Our customer problems are not changing. Travel still has the same complications. We just now have a tool that’s really good at taking some of those problems away.”
Embrace failure as data. That 12-month disaster in 2017 taught them more than any success. Now they’ve streamlined legal compliance assessments from 3-4 months down to 3-4 days.
Make AI accessible to everyone. When your whole org can think in AI terms, you get better ideas. It’s that simple.
Ship fast, learn faster. They have 200+ GenAI implementations prioritized and dozens of successful product launches already adding meaningful revenue.
The Energy in the Room
Standing at the back of the ProductLab Stage, watching PMs frantically take notes while Pranav fielded questions, I was reminded why we do ProductLab Conf>.
It’s not just about the content (though Pranav’s was exceptional). It’s about that moment when someone shares the real story—the failures, the breakthroughs, the behind-the-scenes details—and you realize you’re not alone in struggling with these problems.
By the way, Pranav offered a beer to anyone who could guess how many predictions Booking makes daily. Someone shouted “5 billion!” They were off by 195 billion.





