Artwork

Το περιεχόμενο παρέχεται από το Real Python. Όλο το περιεχόμενο podcast, συμπεριλαμβανομένων των επεισοδίων, των γραφικών και των περιγραφών podcast, μεταφορτώνεται και παρέχεται απευθείας από τον Real Python ή τον συνεργάτη της πλατφόρμας podcast. Εάν πιστεύετε ότι κάποιος χρησιμοποιεί το έργο σας που προστατεύεται από πνευματικά δικαιώματα χωρίς την άδειά σας, μπορείτε να ακολουθήσετε τη διαδικασία που περιγράφεται εδώ https://el.player.fm/legal.
Player FM - Εφαρμογή podcast
Πηγαίνετε εκτός σύνδεσης με την εφαρμογή Player FM !

Measuring Bias, Toxicity, and Truthfulness in LLMs With Python

1:15:53
 
Μοίρασέ το
 

Manage episode 396312145 series 2637014
Το περιεχόμενο παρέχεται από το Real Python. Όλο το περιεχόμενο podcast, συμπεριλαμβανομένων των επεισοδίων, των γραφικών και των περιγραφών podcast, μεταφορτώνεται και παρέχεται απευθείας από τον Real Python ή τον συνεργάτη της πλατφόρμας podcast. Εάν πιστεύετε ότι κάποιος χρησιμοποιεί το έργο σας που προστατεύεται από πνευματικά δικαιώματα χωρίς την άδειά σας, μπορείτε να ακολουθήσετε τη διαδικασία που περιγράφεται εδώ https://el.player.fm/legal.

How can you measure the quality of a large language model? What tools can measure bias, toxicity, and truthfulness levels in a model using Python? This week on the show, Jodie Burchell, developer advocate for data science at JetBrains, returns to discuss techniques and tools for evaluating LLMs With Python.

Jodie provides some background on large language models and how they can absorb vast amounts of information about the relationship between words using a type of neural network called a transformer. We discuss training datasets and the potential quality issues with crawling uncurated sources.

We dig into ways to measure levels of bias, toxicity, and hallucinations using Python. Jodie shares three benchmarking datasets and links to resources to get you started. We also discuss ways to augment models using agents or plugins, which can access search engine results or other authoritative sources.

This week’s episode is brought to you by Intel.

Course Spotlight: Learn Text Classification With Python and Keras

In this course, you’ll learn about Python text classification with Keras, working your way from a bag-of-words model with logistic regression to more advanced methods, such as convolutional neural networks. You’ll see how you can use pretrained word embeddings, and you’ll squeeze more performance out of your model through hyperparameter optimization.

Topics:

  • 00:00:00 – Introduction
  • 00:02:19 – Testing characteristics of LLMs with Python
  • 00:04:18 – Background on LLMs
  • 00:08:35 – Training of models
  • 00:14:23 – Uncurated sources of training
  • 00:16:12 – Safeguards and prompt engineering
  • 00:21:19 – TruthfulQA and creating a more strict prompt
  • 00:23:20 – Information that is out of date
  • 00:26:07 – WinoBias for evaluating gender stereotypes
  • 00:28:30 – BOLD dataset for evaluating bias
  • 00:30:28 – Sponsor: Intel
  • 00:31:18 – Using Hugging Face to start testing with Python
  • 00:35:25 – Using the transformers package
  • 00:37:34 – Using langchain for proprietary models
  • 00:43:04 – Putting the tools together and evaluating
  • 00:47:19 – Video Course Spotlight
  • 00:48:29 – Assessing toxicity
  • 00:50:21 – Measuring bias
  • 00:54:40 – Checking the hallucination rate
  • 00:56:22 – LLM leaderboards
  • 00:58:17 – What helped ChatGPT leap forward?
  • 01:06:01 – Improvements of what is being crawled
  • 01:07:32 – Revisiting agents and RAG
  • 01:11:03 – ChatGPT plugins and Wolfram-Alpha
  • 01:13:06 – How can people follow your work online?
  • 01:14:33 – Thanks and goodbye

Background Links:

Dataset Links:

Tutorials and Documentation for Python Packages:

Measurement Links:

Training Data for LLMs:

Agents and Plugin Links:

Additional Links:

Level up your Python skills with our expert-led courses:

Support the podcast & join our community of Pythonistas

  continue reading

201 επεισόδια

Artwork
iconΜοίρασέ το
 
Manage episode 396312145 series 2637014
Το περιεχόμενο παρέχεται από το Real Python. Όλο το περιεχόμενο podcast, συμπεριλαμβανομένων των επεισοδίων, των γραφικών και των περιγραφών podcast, μεταφορτώνεται και παρέχεται απευθείας από τον Real Python ή τον συνεργάτη της πλατφόρμας podcast. Εάν πιστεύετε ότι κάποιος χρησιμοποιεί το έργο σας που προστατεύεται από πνευματικά δικαιώματα χωρίς την άδειά σας, μπορείτε να ακολουθήσετε τη διαδικασία που περιγράφεται εδώ https://el.player.fm/legal.

How can you measure the quality of a large language model? What tools can measure bias, toxicity, and truthfulness levels in a model using Python? This week on the show, Jodie Burchell, developer advocate for data science at JetBrains, returns to discuss techniques and tools for evaluating LLMs With Python.

Jodie provides some background on large language models and how they can absorb vast amounts of information about the relationship between words using a type of neural network called a transformer. We discuss training datasets and the potential quality issues with crawling uncurated sources.

We dig into ways to measure levels of bias, toxicity, and hallucinations using Python. Jodie shares three benchmarking datasets and links to resources to get you started. We also discuss ways to augment models using agents or plugins, which can access search engine results or other authoritative sources.

This week’s episode is brought to you by Intel.

Course Spotlight: Learn Text Classification With Python and Keras

In this course, you’ll learn about Python text classification with Keras, working your way from a bag-of-words model with logistic regression to more advanced methods, such as convolutional neural networks. You’ll see how you can use pretrained word embeddings, and you’ll squeeze more performance out of your model through hyperparameter optimization.

Topics:

  • 00:00:00 – Introduction
  • 00:02:19 – Testing characteristics of LLMs with Python
  • 00:04:18 – Background on LLMs
  • 00:08:35 – Training of models
  • 00:14:23 – Uncurated sources of training
  • 00:16:12 – Safeguards and prompt engineering
  • 00:21:19 – TruthfulQA and creating a more strict prompt
  • 00:23:20 – Information that is out of date
  • 00:26:07 – WinoBias for evaluating gender stereotypes
  • 00:28:30 – BOLD dataset for evaluating bias
  • 00:30:28 – Sponsor: Intel
  • 00:31:18 – Using Hugging Face to start testing with Python
  • 00:35:25 – Using the transformers package
  • 00:37:34 – Using langchain for proprietary models
  • 00:43:04 – Putting the tools together and evaluating
  • 00:47:19 – Video Course Spotlight
  • 00:48:29 – Assessing toxicity
  • 00:50:21 – Measuring bias
  • 00:54:40 – Checking the hallucination rate
  • 00:56:22 – LLM leaderboards
  • 00:58:17 – What helped ChatGPT leap forward?
  • 01:06:01 – Improvements of what is being crawled
  • 01:07:32 – Revisiting agents and RAG
  • 01:11:03 – ChatGPT plugins and Wolfram-Alpha
  • 01:13:06 – How can people follow your work online?
  • 01:14:33 – Thanks and goodbye

Background Links:

Dataset Links:

Tutorials and Documentation for Python Packages:

Measurement Links:

Training Data for LLMs:

Agents and Plugin Links:

Additional Links:

Level up your Python skills with our expert-led courses:

Support the podcast & join our community of Pythonistas

  continue reading

201 επεισόδια

모든 에피소드

×
 
Loading …

Καλώς ήλθατε στο Player FM!

Το FM Player σαρώνει τον ιστό για podcasts υψηλής ποιότητας για να απολαύσετε αυτή τη στιγμή. Είναι η καλύτερη εφαρμογή podcast και λειτουργεί σε Android, iPhone και στον ιστό. Εγγραφή για συγχρονισμό συνδρομών σε όλες τις συσκευές.

 

Οδηγός γρήγορης αναφοράς