Optimize for Win Rate, Not Odds

Qu Kai: Today we have Albert back with us. He’s been on our podcast twice before and is a very popular guest.

It’s been roughly three years since 2023, and you’ve tried quite a few things. Your last product reached low tens of millions of dollars in ARR. Can you walk us through how your thinking has evolved over these three years?

Albert: When I recorded the first episode in 2023, I was still prioritizing odds over win rate.

The question I was asking was: if this thing works, how much value could it create, and how strong would the moat be? When you think from that angle, the conclusion you naturally arrive at is: how do I use AI as a powerful hammer to create business models that have already been proven? Because my background has always been in connections and content, the direction was clear: find a platform built around content as its core medium. And to define that kind of product, the key is discovering a new medium.

We bet on interactive content at the time. It’s fundamentally different from passive consumption formats like video and images, and AI had unlocked coding capabilities that made creating this kind of content much easier. We built two demos: one was a 2D interactive experience oriented around images and video, and the other replaced the interaction model with a joystick controller on mobile. We also built a more game-like interactive space on PC.

After building these, my biggest takeaway was: I couldn’t answer the question “Why wouldn’t I just play Honor of Kings or scroll through Douyin instead?”

That made me realize a pattern: in content markets, the higher the creation barrier for a given format, the scarcer the supply. Users have limited time and will only consume the top 1% of content. Back then, AI could only produce content at a 60, 70, maybe 80 out of 100 — but 80-point content is garbage from a consumer’s perspective. So the path of building for consumption through content generation was clearly not going to work.

Building for expression — making tools — might have value, but there’s a crucial distinction: is the user’s creative motivation genuine self-expression, or is it more utilitarian — making money, gaining influence? The higher the creation cost for a given format, the more supply-driven it becomes. If your entry point is lowering the barrier, you sacrifice creative freedom and end up with an all-in-one bundle — because there’s a natural tradeoff between the two.

AI might be able to break through that tradeoff. But even if you find a good solution on the tools side with mature technology, the distribution side still poses enormous obstacles. Take interactive content: if it leans more toward a game format, it’s very hard to deliver more value and a better experience than Steam, the App Store, or TapTap.

So I went back to square one and asked myself: why was I so set on building a platform? I eventually realized that this impulse was deeply shaped by the Chinese market. In China, if you don’t have a platform with strong economies of scale and strong network effects, it’s very difficult to compete against the giants. Your win rate would be extremely low.

But when I studied the U.S. market, I found that a large number of niche markets exist there, and there’s clearly room to find a reasonable win rate.

So by early 2024, I made the switch — from being odds-driven to win-rate-driven: studying which technologies had matured and which real user problems remained unsolved.

Qu Kai: You said that before 2024, you were essentially optimizing for odds. That’s actually what the vast majority of founders are doing. Can you elaborate on how you think about these two concepts?

Albert: The reason most founders optimize for odds is simple: VCs are also optimizing for odds. When you’re aligned on that, it’s easier to raise money.

But the truly successful entrepreneurs of the last generation were almost all optimizing for win rate. Some of them just got lucky because the markets they happened to enter also had enormous odds. For instance, Zhang Yiming is a very conservative person — a textbook win-rate optimizer. I once asked him how I should choose a startup direction. He asked me back: why not do something you’re more confident about?

The thing Zhang Yiming was most confident about was information distribution. He was already working on search toward the end of the PC internet era. When he left to start ByteDance, his first product was Neihan Duanzi. By 2014, there were already many video products both in China and abroad, but he still said no. He waited until 2016, when many conditions had matured, before officially entering the video space.

Huang Zheng is another classic example. During the PC era, he was in e-commerce, constantly observing changes in the supply side and the traffic landscape, while running various businesses deep in the supply chain. When the structural opportunity for Pinduoduo emerged, he seized it.

Wang Xing building Meituan seems like it was completely different from everything he’d done before. I actually asked Wang Huiwen about this once, and his answer was: at that point in time, among everyone doing group buying, nobody who understood offline operations knew online as well as they did, and nobody who understood online knew offline as well as they did. During the Xiaonei (campus social network) days, they had been heavily involved in offline grassroots promotion and management.

So what looks like a sudden pivot was actually them building up capabilities in one era and deploying them in the next. That’s a very typical win-rate optimization strategy.

Truly first-rate entrepreneurs are almost always optimizing for win rate. None of them are genuinely optimizing for odds. Optimizing for odds is essentially gambling.

Qu Kai: I used to think optimizing for win rate versus odds was just a matter of different strategies. Based on what you’re saying, it’s not really a choice at all — it’s more of a right-or-wrong question. So what does it actually mean to optimize for odds? Could it be a fundamentally flawed concept?

Albert: For example, if you say “I want to build the next Douyin,” that’s optimizing for odds. Because the prize is huge, so I should go for it — that premise itself is flawed. If you were truly optimizing for win rate, you wouldn’t be able to say “I want to build Douyin” on day one. What you should be saying on day one is: what problem am I actually solving?

Qu Kai: So when you switched from optimizing for odds to optimizing for win rate, what actually changed in your behavior?

Albert: To be more specific, it means choosing things with fewer variables — things where I have more control. If something has too many variables, too much unpredictability, and too much that falls outside my capabilities, I try to avoid it.

Qu Kai: What about from an individual’s perspective? If someone is thinking about joining your team or joining a startup in general, are they optimizing for win rate or odds?

Albert: I think for any job candidate, the fundamental approach should be optimizing for win rate. Because what you’re ultimately building is your own capability, your own perspective, and the quality of your information. When you look back, all of those things are essentially win rate.

But that doesn’t mean optimizing for win rate means you’ll miss out on big payoffs. Quite the opposite — it’s only when you’re optimizing for win rate that you actually increase your chances of landing a big payoff. In our last podcast, I made what I think is a very important point: luck is what happens when your advantages are amplified by time.

So big payoffs come to you through patience, while win rate comes from identifying important problems and making deliberate choices. The best scenario is choosing directions where the future payoff could be enormous, but where you can still make progress today using a win-rate approach.

Conversely, if someone doesn’t actually believe in what a company is doing or respect the people there, and is only joining because “I heard they’re IPO-ing next year” or “they just raised a great round” — that’s textbook odds optimization.

Qu Kai: You mentioned Zhang Yiming earlier. You’ve also been studying Duan Yongping quite a bit recently. If you compare the two, what have you learned from each? What’s the biggest difference between them?

Albert: ByteDance overall leans toward a strong-player mindset, while Duan Yongping leans toward an underdog mindset.

I’ve always felt that, at a certain stage, ByteDance isn’t the right learning model for ordinary people, because it creates two powerful illusions: you start treating perfectionism as the standard, and you always try to reason from first principles. But first-principles thinking requires enormous resources as a prerequisite. For most people, you need to think within constraints.

Duan Yongping’s investment philosophy and his entrepreneurial philosophy are actually the same. In investing, he talks about “right business, right people.” This comes from Buffett, and it boils down to two things: business model and culture. Applied to running a company, that translates to strategy and management.

This approach elevates culture to a higher position: we’re all ordinary people, but under the right culture, choosing a direction that genuinely creates differentiated value, ordinary people can achieve extraordinary results. Duan Yongping’s philosophy has more equanimity and is more accessible to regular people. Huang Zheng also emphasizes this point strongly.

Qu Kai: All right, let’s talk about AI. I really like a framework of yours that splits AI into “imagination” and “intelligence.” How does this classification help you understand the industry?

Albert: AI currently has two types of use cases. One is helping users kill time — giving them some kind of experiential, process-oriented enjoyment. The other is helping users save time — reducing costs, completing tasks. From this perspective, these also happen to correspond to the two directions model development has taken: on one side, multimodal models for images and video; on the other, language models.

For entrepreneurs, making this distinction used to be essential: the models themselves are very different, and whether you use a language model or an image/video model directly determines where your startup opportunity lies. But this may change going forward, because multimodal capabilities have evolved significantly. For example, as Gemini’s comprehension improves, it can also enhance image generation quality — like the “nano banana” benchmark.

Qu Kai: So at least for now, this classification still holds? Let’s start with the imagination side — image and video models. There are really two tracks here: one is tool products, like those for marketers and professional creators, which are fundamentally also about saving time; the other is companion and interactive content products that lean more toward entertainment. How do you see these two tracks?

Albert: A fundamental direction in model development is that platforms will always offer better model capabilities and charge higher prices for better results. While previous-generation models do get cheaper when new ones launch, the real cost of inference hasn’t dropped significantly, so high-quality output always commands a premium.

If you’re trying to build an entertainment product rather than a tool product today, it’s nearly impossible to sustain long-term use of SOTA models, because the unit economics simply don’t work.

Tools are the highest-certainty path right now, with a very clear monetization model. By contrast, interactive and entertainment content — with AI companions as a prime example — represents a very real category, but whether the monetization efficiency can really work out is still very hard to judge.

Qu Kai: We’ve discussed the problems with interactive content before. No matter what you do, it’s probably very hard to surpass the experience of Douyin or Honor of Kings. Model capabilities may have gone from a 20-30 out of 100 to 70-80, and the things being built really are newer and more impressive, but users may not actually care.

Albert: Right. The more immersive and demanding the content, the higher the user’s cost of engagement, the fiercer the competition, and the scarcer the supply. In the end, maybe only 0.0001% of creators can produce the best work.

The breakthrough here probably isn’t in the content itself, but in the container that holds it. Take short-form video as a container: is every single piece of content inside it necessarily high-quality? Not really. But the container itself helps users form habits — it can even hack user behavior and make people more likely to get hooked.

But if you haven’t found a good container format, and the content itself demands a significant time commitment from users, competition becomes extremely intense.

Qu Kai: A lot of people are still talking about building “the Douyin of the AI era,” but based on what you’re saying, AI’s role isn’t to generate better content — because even if your AI-generated content is better than what humans make today, creators will still upload it to Douyin. So what matters more is what you just called the container — a new form of interaction, one that’s inherently suited to the content it delivers.

Albert: Building on that, let me add one more point: the best content will always flow to wherever monetization efficiency is highest, and monetization efficiency is ultimately determined by economies of scale and network effects. So existing platforms inherently have a massive advantage.

If you haven’t created a new content format and have only gained stronger content production capabilities on the tools side, that’s actually useless. Even if you build an incredible editing app, where does the content it produces end up? On Douyin, on Netflix — not on some new platform.

Qu Kai: When you look at Douyin’s early days, its interaction model wasn’t really that innovative. It was basically just swiping up and down. It’s just that the underlying conditions changed — network infrastructure, data costs — and the model took off. Is that a fair way to put it?

Albert: I think looking back today, a successful product format always requires three things to converge simultaneously: first, users; second, the medium; third, the content type.

Take Xiaohongshu (RED), for example. It uses image-and-text posts to deliver “useful content,” serving women in tier-one and tier-two cities. That loop works.

Douyin’s medium is short-form video. Its content is the consumption experience created through timeline editing — beat drops, camera movements, music synchronization. Its initial users were people who were great at singing, dancing, and performing on camera. That loop also works.

Or take Neihan Duanzi: it used mixed image-and-text layouts to deliver jokes, funny content, and lowbrow humor, serving a very specific audience. It had its own loop too.

Only when you find the convergence of all three does cold start become easier, and only then can you try to expand into broader territory. A product like Neihan Duanzi had a very hard time generalizing, because it was defined by a content genre rather than a more powerful media type. Genre-based verticals ultimately can’t match the gravitational pull of medium-based verticals. What Douyin ultimately owned was the short-form video medium. What Xiaohongshu owned was the image-and-text format for useful content.

So in hindsight, whether a product ultimately takes off is really the result of many coincidences and many deliberate design choices layered on top of each other. I remember there was a product, I think it was called Huoying, that was one of the earlier products in China to do a full-screen experience. Its DAU was quite high at one point. Its use case seemed to be a community for sharing animated wallpapers. But its content format didn’t match its users or its medium. Though it had scale in the short term, it never broke through.

So there were actually many short-video products back then, and they all disappeared. The reason was that they never properly defined the intersection of those three elements.

Qu Kai: So do you still believe there will be a “next Douyin” in the AI era? Or do you think the next Douyin is just Douyin itself?

Albert: That depends on how you define “the next Douyin.”

One of the reasons I left ByteDance was that I realized, across the entire mobile internet in China, apart from WeChat, virtually no native mobile app could sustain over 100 million DAU for a long period. I thought that was irrational at the time. There were about 700 to 800 million active mobile devices in China, and WeChat had around 600 million DAU. Logically, every active device should have a messaging app and should also have an entertainment product. How could there not be a universal-scale opportunity in entertainment? So I left ByteDance to go find that opportunity. I just didn’t expect that short-form video would ultimately grow as massive as it did. That was a misjudgment on my part.

But today, I think the logic is actually similar. The level of intelligence available now is already very strong, and the potential for intelligence in entertainment is enormous. ChatGPT already has a very large DAU, and in the future, users on virtually every active device will interact with AI in some form. If that prediction holds, then first, ChatGPT itself still has enormous room to grow; and second, in the adjacent spaces that emerge from its footprint, there will certainly be many entertainment needs driven by intelligence.

Qu Kai: I’d like to touch on companies like Higgsfield in the multimodal space, since your original direction also leaned toward video generation. What do you think the key differences are? At that point in time, what did Higgsfield get right that allowed it to take off so quickly?

Albert: I think to answer why Higgsfield took off, you can’t just look at what they did right — you first need to understand the state of model capabilities and the competitive landscape in the video and image model space.

First, this space isn’t dominated by a single player. It’s multi-leader, multi-strong. The first tier includes Sora, Seedance, Veo, and Kling, each holding SOTA in different scenarios and at different stages.

Whenever model capabilities are distributed unevenly like this, aggregator and all-in-one products inevitably have an opening. Users naturally want to spend less money while accessing more model services — that’s almost a given.

Second, the demand for visual content is large enough on its own. From social media creators to every imaginable commercial use case, nearly every company and most individuals need visual content. A highly fragmented yet universally present demand like this naturally gives rise to more general-purpose product forms.

Dig deeper and you find two more constraints. One: no matter how powerful the models are, the number of people who can effectively use a limited idea is still limited. Two: in any multimodal content creation, there’s always a massive gap between what you can describe in language and what you’re actually imagining. Stack these constraints together and you realize that someone is inevitably going to use templates to define aesthetics and dramatically reduce the cost for users.

So once you lay out all these conditions, you can pretty quickly see what the product most likely to capture this moment would look like — and it would increasingly resemble something like Higgsfield. The problems it needs to solve are templatized workflows, an aesthetics community, and lowering the cost for users.

But even if you define the product form correctly, that’s still not enough, because there are actually quite a few products with a similar form factor in the industry. So we need to look at two more metrics: how strong is user intent, and how strong is delivery capability.

User intent actually rises along with the overall hype around AI. Everyone is constantly educating the market — “AI is amazing, it’s powerful, it can do all these things.” So the macro beta is trending upward.

But delivery capability doesn’t rise the same way — it iterates incrementally. A user might see your demo, feel amazed, and try it out. But if the actual output is terrible, the cost of getting them to try again goes up significantly.

Video models are advancing incredibly fast: effects that were impossible a month ago become possible a month later. The thing Higgsfield does best is that it consistently manages to package whatever can actually be delivered at a given stage into a highly marketable product feature. When consistency was still poor early on, they launched Soul. Actually, another company had previously built the best Flux LoRA model product overseas, but they didn’t manage to sell that capability well.

Then came drag-to-video, and more recently lighting control. Every time, Higgsfield manages to fairly accurately package its delivery capability into something that works on social media. But if you look closely, what they’re selling is still about 30% real and 70% illusion. Their team has such a deep understanding of content that when showcasing these capabilities, they know exactly what footage to pick and how to present it to make the capability look as compelling as possible.

When users see it, they’re blown away — but once they try it themselves, they discover it’s very hard to replicate the effect shown in the demo.

Qu Kai: So people have stopped debating whether being a “wrapper” is a good business. The real question is who can wrap better. Being a wrapper isn’t the issue — what matters is how well you do it.

Albert: I’ve always felt that the term “wrapper” is an engineer’s perspective. Users don’t care at all whether you’re a wrapper. Users only care about two things: first, are you the best option right now? Second, did you solve my problem?

So of course, the better the model capabilities, the more advantageous it is for applications. The key isn’t whether you’re using someone else’s model — it’s whether you can truly extract and leverage that model’s capability. And the Higgsfield example shows us that just leveraging it well isn’t enough — you also have to showcase it well.

Qu Kai: So if a founder wants to be a great wrapper, there are a few things that are clearly important. First, you definitely need a very deep understanding of the models. What’s new, what’s likely coming next — you need judgment and awareness.

Second is what you just mentioned: aesthetics. Whether it’s content aesthetics or product aesthetics, you need to know how to actually put model capabilities to use.

And then there’s execution speed. Over the past couple of years, everyone has been emphasizing execution, because models keep changing and upgrading, so how fast you can wrap matters enormously. Very often, it’s whoever is first to put a new capability to use that captures the maximum value from that new model.

Albert: Exactly.

Qu Kai: You’ve been saying something a lot recently: “Do things the way they’re theoretically supposed to be done.” The more I think about it, the more it resonates. Can you explain what’s behind that idea?

Albert: At its core, it’s a mindset issue: how do you, despite seeing how imperfect something is, still believe you should give your all to bring it to its theoretical ideal? So it’s more of a guiding principle. Because in the real world, you’re actually very far from that state.

Qu Kai: Yes. When I hear that phrase, I often think it sounds like “do the right thing.”

Albert: It’s not “do the right thing.” It’s “do things right.” It’s about how, not why.

Qu Kai: So what do you think is the right thing to do in 2026?

Albert: First, I think AI is still a very long game.

Multimodal comprehension is definitely worth leveraging. I’ve been deliberately saying “video models” and “image models” rather than just “multimodal,” because in my view, multimodal really refers to comprehension capability, not generation capability.

For a long time, comprehension lagged behind generation. What people called “multimodal” in recent years was really just progress in video and image models — it had little to do with comprehension or intelligence. But there has been major progress now. Gemini 3, for example, has shown very significant improvements in comprehension.

At least from what we can see today, Google has a fairly clear advantage when it comes to major leaps in comprehension, primarily due to its computing power advantage. They’ve also found methods that continue to scale, which is why their comprehension capabilities have improved so dramatically.

The next, more critical question is: can improvements in comprehension also raise the ceiling of intelligence itself?

In other words, it’s not just about the traditional sense of “understanding images and understanding video” in multimodal comprehension. The question is whether, as this comprehension ability gets stronger, it can also elevate the model’s general intelligence. I think people are relatively optimistic about this, and I’m certainly optimistic.

Because as comprehension gets stronger, the number of scenarios it can unlock will only grow. I remember the question I was pondering last time: what happens when the eyes come equipped with a brain? I’m still thinking about that question.

Qu Kai: If that’s the case, then over the past few years, intelligence has really been the biggest lever — you just didn’t apply it to its fullest at the right moment. Manus is a very typical example.

Albert: I don’t think it’s too late. And I don’t fully agree that “the biggest lever over the past few years was intelligence.” More precisely, the biggest lever has been coding. So beyond multimodal, the second important thing is the democratization of coding. How do you democratize coding, and how do you find an interaction model that more effectively unleashes the model’s capabilities in this context?

Because intelligence doesn’t manifest on its own — it has to work through coding. Only coding can push intelligence beyond the level where it merely “knows how to answer” and “knows how to understand.”

But if you want to put coding capability to good use, being early doesn’t actually help. It has to reach a certain threshold before it’s viable. You needed at least the Sonnet 3.5 stage, and then Opus, before the capability truly started becoming usable. It was roughly from that point onward that many things began to make sense. So I’d argue it’s not that whoever sees the opportunity earliest necessarily has the biggest advantage — it’s that once the models genuinely reach that inflection point, the pace of innovation starts changing by the day.

Qu Kai: Have you ever imagined — assuming the technology is fully mature in the future and inference costs are low enough — what the coolest product would be? Think of it as writing a science fiction story.

Albert: I’ve actually been thinking about a very interesting story recently. There’s a person who is a true believer in AI. He believes that everything is predetermined, that everything can be proven. So he sets out to aggregate all the world’s computing power and inject every “proven constraint” into a single system. First principles from physics, neuroscience, biology — he puts all of these constraints in, and then lets the model evolve on its own.

At the right moments, he intervenes to adjust parameters, gradually aligning the evolutionary process with Earth’s actual development. Perhaps starting from the earliest life forms, through the emergence of humans, and then the evolution of civilization. Slowly, one day, he discovers that the evolution inside this system has finally aligned with humanity’s present.

And then they begin to observe this world.

But his real motivation for building this system isn’t just to replicate history — it’s to pour even more computing power into predicting the future. Because the constraints haven’t changed, he wants to see what would happen if you kept the simulation running forward under those same constraints.

Then one day, he discovers that this simulated world has frozen at a particular moment. Because within that world, someone has also begun aggregating all available computing power to predict their own future. And the cycle begins again.

So in the end, you realize that all imagination about the future is, at its core, about predicting the future itself.

Qu Kai: I recall Elon Musk once said that the probability of our reality being a simulation is actually very high.

Albert: Right. That’s ultimately the conclusion you arrive at. The manifestation of the future is, in essence, the endless act of predicting it.

Optimize for Win Rate, Not Odds

In the Age of AI, Value Is Migrating

The Lobster Era: LLM Companies Finally See Profit

Robotics Investors Flock to Elderly Care Homes