Nic has been terrifying me with tales of the new “reasoning” language models run amok during safety training by reputable researchers. This has led me back to thinking about the role of these models in democratizing the economy. So I read an “opinionated” recent* review of reasoning language models (RLM) that included a framework for building these types of models. This framework reminded me of Powell’s approximate dynamic programming (ADP) approach, which I had promised myself would be the focus of my recreational research a few years ago. Both frameworks use approximate “value” and “policy” models, but I’m not sure that they use the same terminology.

One hypothesis is that there is a strong similarity between the two approaches. The RLM paper explicity models the reasoning step with a Markov decision process, in other words a stochastic dynamic program. If this is the case, then would it be possible to use both approaches, explicitly treating reasoning processes as approximate dynamic programs.

If the core of a RLM is an ADP, could the former be used to create special-purpose versions of the latter?

Reasoning language models

Has dynamic programming improved decision-making?

Relatively old paper by economist John Rust from 2018, written on the eve of the Transformers paper. Rust studied at MIT and has some sympathy for AI. Compares “human learning” and “machine learning” in dynamic programming.

Dependent types and sequential decision problems

CatColab and participatory action research