About Me
I am an engineer at Alibaba Group working on the infrastructure of reinforcement learning for large language models [RL2] [GEM, ICLR’26]. Previously, I worked on the algorithms of reinforcement learning, specifically reward modeling [CHARM] [LR4GPM, AAMAS’23] and multi-armed bandits [ACML’22]. I also worked on knowledge editing [MALMEN, ICLR’24]. See my projects and publications.
