Kenneth Li

I am a third-year PhD student at Harvard, advised by Fernanda Viégas, Hanspeter Pfister, and Martin Wattenberg. I am supported by Kempner Institute Graduate Fellowship. I interned at MSR Asia and Meta AI.

I aim at understanding the inner workings of large language models and, based on these findings, improving the controllability of current model behavior to secure human benefits. Feel free to reach out if you'd like to discuss!

Contact: ke_li [at] g.harvard.edu
Google Scholar | Twitter | Github | Anonymous Feedback Form

Research

	Measuring and Controlling Instruction (In)Stability in Language Model Dialogs Kenneth Li, Tianle Liu, Naomi Bashkansky, David Bau, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg preprint Arxiv \| Code When a conversation goes long, a personalized chatbot quickly ceases to follow its system prompt (within 8 rounds).
	Q-Probe: A Lightweight Approach to Reward Maximization for Language Models Kenneth Li, Samy Jelassi, Hugh Zhang, Sham Kakade, Martin Wattenberg, David Brandfonbrener preprint Arxiv \| Code Through rejection sampling, we leverage a language model's own discriminative capability to boost its generative capability.
	Inference-Time Intervention: Eliciting Truthful Answers from a Language Model Kenneth Li, Oam Patel, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg NeurIPS, 2023 (Spotlight) Arxiv \| Code \| Stand-alone Model By manipulating the activations of a language model, we can compel it to tell the truth it knows but otherwise hides.
	Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task Kenneth Li, Aspen Hopkins, David Bau, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg ICLR, 2023 (Oral) Arxiv \| Code \| Demo \| The Gradient \| Scientific American \| The Atlantic \| Nature News \| Andrew Ng In a transformer trained on Othello transcripts, we uncover an interpretable and controllable world model of the game board.

Latest update: 03/2024