Discussion about this post

User's avatar
Matt Quinn's avatar

You elude to it in the narrative, but this is a question that has bugged me for a while - if we are using data generated by an LLM to train and LLM…we just risk amplifying and reinforcing the errors (and the gaps) that exist in them already.

Synthetic data can be a great way to distill a model - expand in one dimension or domain with guided generation based on the LLM and some resource your augment it with in the prompting, but the idea that they can be used to retain whole new, bigger, better models feels a bit “perpetual motion” to me 🧐

Expand full comment

No posts