Day 3 - Lesson 1 recap & intro to deployment

March 3, 2024 (9mo ago)

Today the goal was to solidify the concepts learned in the first lesson of the fastai course. Here are my responses from the first chapter of the book.


1. Do you need these for deep learning?

  • Lots of math T / F
  • Lots of data T / F
  • Lots of expensive computers T / F
  • A PhD T / F

2. Name five areas where deep learning is now the best in the world.

NLP, computer vision, text & image generation, recommendation systems, and forecasting.

3. What was the name of the first device that was based on the principle of the artificial neuron?

Mark 1 Perceptron

4. Based on the book of the same name, what are the requirements for parallel distributed processing (PDP)?

A set of processing units, a state of activation, an output function for each unit, a pattern of connectivity among units, a propagation rule for propagating patterns of activities through the network of connectivities, an activation rule for combining the inputs impinging on a unit with the current state of that unit to produce an output for the unit.

5. What were the two theoretical misunderstandings that held back the field of neural networks?

  1. Authors of "Perceptrons" showed that the sinlge layer could not perform simple XOR operation, but multiple layers could. Unfortunately only the first insight was widely recognized, which led to the belief that neural networks were limited to linear operations.
  2. The second misunderstanding was that the field of neural networks was held back by the lack of computational power.

6. What is a GPU?

Graphics Processing Unit, nowadays used not only for rendering graphics but also for training deep learning models.

7. Open a notebook and execute a cell containing: 1+1. What happens?

Kernel executes the cell and returns 2.

8. Follow through each cell of the stripped version of the notebook for this chapter. Before executing each cell, guess what will happen.

9. Complete the Jupyter Notebook online appendix.

10. Why is it hard to use a traditional computer program to recognize images in a photo?

Because it's hard to explain the exact steps that need to be taken to recognize and classify an image.

11. What did Samuel mean by "weight assignment"?

Weight assignment refers to the process of giving values to different factors (weights) that are used to make predictions or decisions.

12. What term do we normally use in deep learning for what Samuel called "weights"?

Parameters

13. Draw a picture that summarizes Samuel's view of a machine-learning model.

14. Why is it hard to understand why a deep learning model makes a particular prediction?

Because deep neural networks have a lot of layers, it's hard to determine the exact factors that are responsible for the final prediction.

15. What is the name of the theorem that shows that a neural network can solve any mathematical problem to any level of accuracy?

Universal approximation theorem

16. What do you need to train a model?

Dataset, an architecture for a given problem (CNN, RNN etc), a loss function (to measure performance) and an optimizer (to update parameters to improve the model).

17. How could a feedback loop impact the rollout of a predictive policing model?

Feedback loops in the rollout of a predictive policing model can reinforce biases and create distrust in the community.

18. Do we always have to use 224×224-pixel images with the cat recognition model?

No, it's possible to use different sizes. It's a standard size due to historical reasons - old models were trained on 224x224 images.

19. What is the difference between classification and regression?

Classification is about predicting a category, while regression is about predicting a number.

20. What is a validation set? What is a test set? Why do we need them?

A validation set is used to measure the performance of the model during training. The test set is used to measure the performance of the model after training. We do not always need a test set, but it's a good practice to have it to measure the performance of the model on unseen data.

21. What will fastai do if you don't provide a validation set?

It will automatically create a validation set by randomly taking 20% of the data.

22. Can we always use a random sample for a validation set? Why or why not?

The validation set should be representative of the data we expect to see in production, in the future. In many cases random sample works, but at other times (eg. dealing with time series data) it's better to use a specific period for the validation set because we want to measure the performance of the model on unseen data.

23. What is overfitting? Provide an example.

Overfitting happens when the model gets good at predicting the outcomes on the data it was trained on, but it hasn't learned the underlying patterns or rules that would help it make accurate predictions on new, unseen data.

24. What is a metric? How does it differ from "loss"?

Metric and loss are a measure of the performance of the model. However, loss is used to update the parameters of the model, while metric is used as a human-readable measure of performance.

25. How can pre-trained models help?

A pretreated model gives us a good starting point for training a model on a new dataset.

26. What is the "head" of a model?

The head of a model is the latest layer that we add to the pre-trained model, which is trained on the new dataset to do a specific task.

27. What kinds of features do the early layers of a CNN find? How about the later layers?

Early layers of a CNN find simple features, like edges, colours, and textures. Later layers find more complex features, like shapes, objects, etc.

28. Are image models only useful for photos?

No, they have a wide range of applications, like medical imaging, spectrograms, time series data, etc.

29. What is an "architecture"?

An architecture is a specific mathematical function we want our model to fit.

30. What is segmentation?

In image recognition, segmentation is the process of dividing an image into smaller pieces, each of which can be classified on a pixel level.

31. What is y_range used for? When do we need it?

y_range is used to limit the range of the output of the model.

32. What are "hyperparameters"?

Hyperparameters are not learned from the data but are instead specified by the practitioner before training begins. Eg. learning rate, number of layers, batch size, etc.

33. What's the best way to avoid failures when using AI in an organization?

To validate the model using a test set that was not given to this specific organization and define a baseline at which the model should perform.

34. Why is a GPU useful for deep learning? How is a CPU different, and why is it less effective for deep learning?

GPU is useful for deep learning because it can perform many multiple computations simultaneously. CPUs are ideal for performing sequential tasks quickly, while GPUs use parallel processing to compute tasks simultaneously with greater speed and efficiency.

35. Try to think of three areas where feedback loops might impact the use of machine learning. See if you can find documented examples of that happening in practice.

  1. Generative AI - feedback loops can lead to biased and harmful content being generated. The most recent example is when Gemini tried to be racial-biased
  2. Social media algorithms - feedback loops can lead to radicalization and polarization
  3. Predictive policing - feedback loops can lead to over-policing of certain communities

That was a solid revision. Besides that, I also watched the second lesson material about deploying models using huggingface spaces. Mr Howard also presented how to build simple interfaces with Glide and pure JavaScript. This part gets me especially excited since the web is my area of expertise. Can't wait to marry the two worlds together.


What I learned today:

  • solidified lesson 1 concepts: the importance of validation and test sets, overfitting, metrics, and loss, pre-trained models etc.
  • intro to huggingface spaces and Glide

Time commitment:

  • 1 hour of lesson 1 revision
  • 1 hour of experimenting with Kaggle
  • 1 hour of lesson 2

Total: 3 hours