Ref: https://www.bilibili.com/video/BV1darmBcE4A/
I listened to the n+e podcast and wrote down the key points and my takeaways. The conversation ranges from childhood learning habits to building RL infrastructure, open source philosophy, and how to think about research vs. industry.
Impressions
I first read n+e’s blog in my senior year. I already felt he was exceptional and very original in his thinking. The podcast confirmed it: he is genuinely talented and I learned a lot. What also stood out is that even for someone like him, life is not a smooth path. He faced setbacks in informatics competitions and graduate applications. It reminded me to look at my own situation from a higher perspective and not get trapped in local optima.
Childhood and Early Learning
- Started Olympiad math in elementary and middle school and progressed much faster than peers.
- Learned with depth, spending more time on fundamentals. He started high-school math in middle school and learned calculus by the end of ninth grade.
- He liked learning ahead of time as an investment in his future, rather than grinding repetitive problem sets.
- Started programming in seventh grade through an interest class.
In high school, he joined informatics competitions (OI) mainly due to admission pressure. He went through NOIP, provincial selection, and summer camps, then competed in NOI. He later received a reduced score admission offer from Tsinghua, though he did not perform well in the national contest and ranked last on the Fujian provincial team. He still chose Tsinghua over a regular-admission option at SJTU.
Tsinghua and Research Direction
One thing he felt most proud of was open-sourcing all his homework and collected materials to break information asymmetry and give more students a fair chance.
He entered research in his sophomore year, mentored by Prof. Zhu Jun, and explored three directions: Bayesian methods, GANs, and RL. He chose RL, though he personally liked AI, graphics, and security.
- Security: he enjoyed the feeling of hacking.
- Graphics: inspired by the movie Tron, he wanted to build his own world.
- RL: he worked on a VizDoom project using graphics for gameplay, but found it boring, overfit-prone, and hard to tune.
To make RL research more productive, he decided to build RL infrastructure, which later became Tianshou.
Mila, NLP, and Applications
In his junior year, he went to MILA in Canada to work on something like a MoE model. It was not RL but an NLP language model. Looking back, he felt the scale was too small to make it work. He believed NLP tasks were too scattered, and he also tried to make RL run on Transformers. He later concluded that RL needs a clean environment (e.g., pure text) and simple rewards. Without enough context and resources, the work was extremely hard.
When applying to PhD programs, he lacked a first-author paper and was only admitted to master’s programs.
Reflections on Application Failure
At Tsinghua, there is a strong belief that PhD > master. He now feels it depends more on the content of your work than the degree itself. He emphasized breaking away from a single evaluation system; GPA is only a narrow metric. For jobs, experience fit matters more than GPA.
He mentioned a research advisor’s evaluation system with three indicators:
- Papers
- Competitions
- GitHub stars
So he tried to contribute more to open source. He also noted that applying abroad during COVID was especially difficult.
Tianshou and the Value of Consistency
He had run many RL codebases and asked: why not integrate them? The Ray RL library felt overly complex with too much abstraction and hundreds of thousands of lines of code.
So he restarted from scratch and finished the first version of Tianshou in two weeks. The project emphasized end-to-end consistency. He argued that many projects decay because each contributor codes for their own use, leading to duplicated code and wrong assumptions.
Why Tianshou worked:
- It solved a real need: researchers lacked a usable, hackable RL framework.
- It kept code short and abstraction clean. New features usually required changes in only one place.
How to avoid project decay:
- Early stage: one person does most of the work.
- Later stage: transfer maintenance to the community, though some inconsistency is inevitable.
Dropping Out and Side Projects
He once built a website that monitors visa appointment slots. This was mentioned as a project during his period of leaving school.
Open Source Motivation
He views open source as a non-utilitarian effort for impact, even at a loss. Impact is more meaningful than money. He had this mindset since high school: the score of life is how many people remember your name when you die.
He realized this is genuinely what he wants, not out of fear of being forgotten, but as part of his personal value system. He also noted that later, after joining OpenAI to work on models, he stopped doing open source because the goal changed.
Job Choices
He first applied for jobs and got offers from Google and OctoML. Google felt uninteresting, and he later received offers from Huanfang and DPSK but did not take them. He chose OpenAI before ChatGPT became mainstream.
Why OpenAI?
- At the time, OpenAI was already a top RL lab.
- He wanted to experience the world’s best lab and learn industrial methodology.
- John Schulman, the RLHF co-creator, was impressed by n+e in the interview.
He does not see a PhD as necessary for industry. In his view, a bachelor’s plus master’s can build enough capital (e.g., citations) to compete with PhDs without the extra years.
Master’s vs. PhD in Industry
He believes the training is different: engineering ability is more important than pure research today. Teaching a researcher to be a good engineer is harder than teaching an engineer to do research. Modern research labs compete on infrastructure correctness and iteration speed. Ideas are cheap; validation speed is what matters.
He also noted that infrastructure quality often determines model quality: fewer bugs, better results.
He prefers “selling shovels,” so at OpenAI he worked on post-training RL infrastructure. He also joked about adding another life metric: maximizing the number of times his name appears on the OpenAI blog.
If you want impact, he argues, industry is often the fastest path. PhD work may not always translate. AI labs mainly need infrastructure talent.
OpenAI: RL, Post-Training, and Culture
He defined RL simply: if you can build an environment and get feedback, that is RL.
Post-training did not have a clear name early on. He did more SFT first, then after building infrastructure they moved to RLHF. When GPT-3.5 first appeared internally, it did not seem game-changing; it solved some problems but not many. The original expectation was to collect user data, and then “it exploded.”
OpenAI felt like a large lab with strong research intuition. After engineers joined from Google, infrastructure iteration accelerated. The philosophy was to build the infra well first.
Why OpenAI succeeded: high talent density. He wondered whether OpenAI could remain small and excellent, with simple org structure, smooth information flow, and low-loss decision-making up and down the hierarchy.
RL Infrastructure Challenges
He said the first big milestone was getting PPO for GPT-4 to work.
Key challenges in RL infra:
- How to measure whether performance is truly good
- How to avoid reward hacking
- How to win on benchmarks
In practice, the team still often has to inspect checkpoints manually and vote.
He compared traditional RL infra and large-language-model infra:
- Traditional RL infra: complex environments, small models
- LLM-scale RL: simple environments, huge models, slow inference and training
Future challenges remain around speed. A good RL infra engineer must understand RL, inference, and training. The workload is intense, six days a week, so health matters (he joked about “working out hard”).
He also mentioned that day-to-day work can be trivial; doing things correctly matters more than raw intelligence.
Looking 5–10 years ahead, he still sees room to scale up. Algorithms, compute, and infra all have room to improve, and infra likely still has bugs.
Main bottleneck: throughput. Fast bug fixing and fast iteration are critical. Agent RL and “normal” RL are not fundamentally different; the environment just becomes more complex.
On AGI, he said people joke that 15 people have 20 definitions in openAI. His own definition is: an AGI can complete 80% of the tasks he considers meaningful, which hasn’t been achieved yet.
OpenAI, Open Source, and Trade-offs
Host said OpenAI has become more closed, which seems to conflict with his goal of reducing information asymmetry. N+E viewed open source as a trade-off: companies must survive, so they cannot open-source their best models.
OpenAI’s mission is to make AGI benefit all of humanity. First you must build AGI, then you can deliver benefits through products (e.g., free model access). He argued this could help ordinary people more than open-sourcing model weights.
But AGI is still far away, host further raised the counterpoint: why not open-source the technical details and improve with community feedback? N+E responded that the risk is that others can use OpenAI’s models, close-source their own, and OpenAI can’t raise money.
If OpenAI had unlimited funding, he would open-source RL infra.
He said the biggest challenge for AGI is execution: consistent, stable execution and a stable organization. He hoped events like Sam’s firing would never happen again.
He believes healthy organizations should be resilient to talent outflow by maintaining a strong pipeline of new talent. External pressure matters; he only named DeepSeek because their infra cycle is fast. AI infra cycle time is a life-or-death metric for model companies. OpenAI has slowed down because the org is larger, handles more use cases, and consistency across teams has decreased.
Worldview
He described the world as a deterministic Markov process, a kind of fatalism: from the Big Bang, everything is predictable. Micro-level randomness, macro-level determinism. He concluded that the best choice is to forget all this and believe that Sisyphus is happy.