In a sense… yes! Although of course it’s thought to be across many modalities and time-scales, and not just text. Also a crucial piece of the picture is the Bayesian aspect - which also involves estimating one’s uncertainty over predictions. Further info: https://en.wikipedia.org/wiki/Predictive_coding
It’s also important to note the recent trends towards so-called “Embodied” and “4E cognition”, which emphasize the importance of being situated in a body, in an environment, with control over actions, as essential to explaining the nature of mental phenomena.
But yeah, it’s very exciting how in recent years we’ve begun to tap into the power of these kinds of self-supervised learning objectives for practical applications like Word2Vec and Large Language/Multimodal Models.
I have to disagree about that last sentence. Augmenting LLMs to have any remotely person-like attributes is far from trivial.
The current thought in the field about this centers around so-called “Objective Driven AI”:
https://openreview.net/pdf?id=BZ5a1r-kVsf
https://arxiv.org/abs/2308.10135
in which strategies are proposed to decouple the AI’s internal “world model” from its language capabilities, to facilitate hierarchical planning and mitigate hallucination.
The latter half of this talk by Yann LeCun addresses this topic too: https://www.youtube.com/watch?v=pd0JmT6rYcI
It’s very much an emerging and open-ended field with more questions than answers.