LW - instruction tuning and autoregressive distribution shift by nostalgebraist

The Nonlinear Library

Το περιεχόμενο παρέχεται από το The Nonlinear Fund. Όλο το περιεχόμενο podcast, συμπεριλαμβανομένων των επεισοδίων, των γραφικών και των περιγραφών podcast, μεταφορτώνεται και παρέχεται απευθείας από τον The Nonlinear Fund ή τον συνεργάτη της πλατφόρμας podcast. Εάν πιστεύετε ότι κάποιος χρησιμοποιεί το έργο σας που προστατεύεται από πνευματικά δικαιώματα χωρίς την άδειά σας, μπορείτε να ακολουθήσετε τη διαδικασία που περιγράφεται εδώ https://el.player.fm/legal.

2M ago 8:32

MP3•Αρχική οθόνη επεισοδίου

Fetch error

Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on October 09, 2024 12:46 (26d ago)

What now? This series will be checked again in the next hour. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: instruction tuning and autoregressive distribution shift, published by nostalgebraist on September 5, 2024 on LessWrong.
[Note: this began life as a "Quick Takes" comment, but it got pretty long, so I figured I might as well convert it to a regular post.]
In LM training, every token provides new information about "the world beyond the LM" that can be used/"learned" in-context to better predict future tokens in the same window.
But when text is produced by autoregressive sampling from the same LM, it is not informative in the same way, at least not to the same extent[1]. Thus, sampling inevitably produces a distribution shift.
I think this is one of the reasons why it's (apparently) difficult to get instruction-tuned / HH-tuned models to report their uncertainty and level of competence accurately, rather than being overconfident.
(I doubt this is a novel point, I just haven't seen it spelled out explicitly before, and felt like doing so.)
Imagine that you read the following (as the beginning of some longer document), and you trust that the author is accurately describing themselves:
I'm a Princeton physics professor, with a track record of highly cited and impactful research in the emerging field of Ultra-High-Density Quasiclassical Exotic Pseudoplasmas (UHD-QC-EPPs).
The state of the art in numerical simulation of UHD-QC-EPPs is the so-called Neural Pseudospectral Method (NPsM).
I made up all those buzzwords, but imagine that this is a real field, albeit one you know virtually nothing about. So you've never heard of "NPsM" or any other competing method.
Nonetheless, you can confidently draw some conclusions just from reading this snippet and trusting the author's self-description:
Later in this document, the author will continue to write as though they believe that NPsM is "the gold standard" in this area.
They're not going to suddenly turn around and say something like "wait, whoops, I just checked Wikipedia and it turns out NPsM has been superseded by [some other thing]." They're a leading expert in the field! If that had happened, they'd already know by the time they sat down to write any of this.
Also, apart from this particular writer's beliefs, it's probably actually true that NPsM is the gold standard in this area.
Again, they're an expert in the field -- and this is the sort of claim that would be fairly easy to check even if you're not an expert yourself, just by Googling around and skimming recent papers. It's also not the sort of claim where there's any obvious incentive for deception. It's hard to think of a plausible scenario in which this person writes this sentence, and yet the sentence is false or even controversial.
During training, LLMs are constantly presented with experiences resembling this one.
The LLM is shown texts about topics of which it has incomplete knowledge. It has to predict each token from the preceding ones.
Whatever new information the text conveys about the topic may make it into the LLM's weights, through gradient updates on this example. But even before that happens, the LLM can also use the kind of reasoning shown in the bulleted list above to improve its predictions on the text right now (before any gradient updates).
That is, the LLM can do in-context learning, under the assumption that the text was produced by an entity outside itself -- so that each part of the text (potentially) provides new information about the real world, not yet present in the LLM's weights, that has useful implications for the later parts of the same text.
So, all else being equal, LLMs will learn to apply this kind of reasoning to all text, always, ubiquitously.
But autoregressive sampling produces text that is not informative about "the world outside" in the same way that all the training texts were.
During training, when an LLM sees information it d...

2437 επεισόδια

#Podcasting Education #The Nonlinear Fund