You are reading the article **26 Things I Learned In The Deep Learning Summer School** updated in December 2023 on the website Cancandonuts.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. *Suggested January 2024 26 Things I Learned In The Deep Learning Summer School*

In the beginning of August, I got the chance to attend the Deep Learning Summer School in Montreal. It consisted of 10 days of talks from some of the most well-known neural network researchers. During this time I learned a lot, way more than I could ever fit into a blog post. Instead of trying to pass on 60 hours worth of neural network knowledge, I have made a list of small interesting nuggets of information that I was able to summarise in a paragraph.

At the moment of writing, the summer school website is still online, along with all the presentation slides. All of the information and most of the illustrations come from these slides and are the work of their original authors. The talks in the summer school were filmed as well, hopefully they will also find their way to the web.

Update: The Deep Learning Summer School videos are now online

Alright, let’s get started.

26 Precious Things I learnt 1. The need for distributed representationsDuring his first talk, Yoshua Bengio said “This is my most important slide”. You can see that slide below:

Let’s say you have a classifier that needs to detect people that are male/female, have glasses or don’t have glasses, and are tall/short. With non-distributed representations, you are dealing with 2*2*2=8 different classes of people. In order to train an accurate classifier, you need to have enough training data for each of these 8 classes. However, with distributed representations, each of these properties could be captured by a different dimension. This means that even if your classifier has never encountered tall men with glasses, it would be able to detect them, because it has learned to detect gender, glasses and height independently from all the other examples.

2. Local minima are not a problem in high dimensionsThe team of Yoshua Bengio have experimentally found that when optimising the parameters of high-dimensional neural nets, there effectively are no local minima. Instead, there are saddle points which are local minima in some dimensions but not all. This means that training can slow down quite a lot in these points, until the network figures out how to escape, but as long as we’re willing to wait long enough then it will find a way.

Below is a graph demonstrating a network during training, oscillating between two states: approaching a saddle point and then escaping it.

Given one specific dimension, there is some small probability p with which a point is a local minimum, but not a global minimum, in that dimension. Now, the probability of a point in a 1000-dimensional space being an incorrect local minimum in all of these would be p^1000, which is just astronomically small. However, the probability of it being a local minimum in some of these dimensions is actually quite high. And when we get these minima in many dimensions at once, then training can appear to be stuck until it finds the right direction.

In addition, this probability p will increase as the loss function gets closer to the global minimum. This means that if we do ever end up at a genuine local minimum, then for all intents and purposes it will be close enough to the global minimum that it will not matter.

3. Derivatives derivatives derivativesLeon Bottou had some useful tables with activation functions, loss functions, and their corresponding derivatives. I’ll keep these here for later.

Update: The min and max functions in the ramp formula should be switched.

4. Weight initialisation strategyThe current recommended strategy for initialising weights in a neural network is to sample values uniformly from [−b,b], where

Hk and Hk+1 are the sizes of hidden layers before and after the weight matrix. Recommended by Hugo Larochelle, published by Glorot & Bengio (2010).

5. Neural net training tricksA few practical suggestions from Hugo Larochelle:

Normalise real-valued data. Subtract the mean and divide by standard deviation.

Decrease the learning rate during training.

Can update using mini-batches – the gradient is more stable.

Can use momentum, to get through plateaus.

6. Gradient checkingIf you implemented your backprop by hand and it’s not working, then there’s roughly 99% chance that the gradient calculation has a bug. Use gradient checking to identify the issue. The idea is to use the definition of a gradient: how much will the model error change, if we increase a specific weight by a small amount.

7. Motion trackingHuman motion tracking can be done with impressive accuracy. Below are examples from the paper Dynamical Binary Latent Variable Models for 3D Human Pose Tracking by Graham Taylor et al. (2010). The method uses conditional restricted Boltzmann machines.

8. Syntax or no syntax? (aka, “is syntax a thing?”)Chris Manning and Richard Socher have put a lot of effort into developing compositional models that combine neural embeddings with more traditional parsing approaches. This culminated with a Recursive Neural Tensor Network (Socher et al., 2013), which uses both additive and multiplicative interactions to combine word meanings along a parse tree.

And then, the model was beaten (by quite a margin) by the Paragraph Vector (Le & Mikolov, 2014), which knows absolutely nothing about the sentence structure or syntax. Chris Manning referred to this result as “a defeat for creating ‘good’ compositional vectors”.

However, more recent work using parse trees has again surpassed this result. Irsoy & Cardie (NIPS, 2014) managed to beat paragraph vectors by going “deep” with their networks in multiple dimensions. Finally, Tai et al. (ACL, 2023) have improved the results again by combining LSTMs with parse trees.

The accuracies of these models on the Stanford 5-class sentiment dataset are as follows:

9. Distributed vs DistributionalChris Manning himself cleared up the confusion between the two words.

Distributed: A concept is represented as continuous activation levels in a number of elements. Like a dense word embedding, as opposed to 1-hot vectors.

Distributional: Meaning is represented by contexts of use. Word2vec is distributional, but so are count-based word vectors, as we use the contexts of the word to model the meaning.

10. The state of dependency parsingComparison of dependency parsers on the Penn Treebank:

The last result is from Google “pulling out all the stops”, by putting massive amounts of resources into training the Stanford neural parser.

11. TheanoWell, I knew a bit about Theano before, but I learned a whole lot more during the summer school. And it is pretty awesome.

Since Theano originates from Montreal, it was especially helpful to be able to ask questions directly from the people who are developing it.

Most of the information that was presented is available online, in the form of interactive python tutorials.

12. Nvidia DigitsNvidia has a toolkit called Digits that trains and visualizes complex neural network models without needing to write any code. And they’re selling DevBox – a machine customized for running Digits and other deep learning software (Theano, Caffe, etc). It comes with 4 Titan X GPUs and currently costs $15,000.

13. FuelFuel is a toolkit that manages iteration over your data sets – it can split them into mini batches, manage shuffling, apply various pre-processing steps, etc. There are prebuilt functions for some established data sets, such as MNIST, CIFAR-10, and Google’s 1B Word corpus. It is mainly designed for use with Blocks, a toolkit that simplifies network construction with Theano.

14. Multimodal linguistic regularitiesRemember “king – man + woman = queen”? Turns out that works with images as well (Kiros et al., 2023).

15. Taylor series approximationWhen we are at point x0 and take a step to x, then we can estimate the function value in the new location by knowing the derivatives, using the Taylor series approximation.

f

(

x

)

=

f

(

x

0

)

+

(

x

–

x

0

)

f

′

(

x

)

+

1

2

(

x

–

x

0

)

2

f

”

(

x

)

+

…

Similarly, we can estimate the loss of a function, when we update parameters θ0 toθ.

J

(

θ

)

=

J

(

θ

0

)

+

(

θ

–

θ

0

)

T

g

+

1

2

(

θ

–

θ

0

)

T

H

(

θ

–

θ

0

)

+

…

where g contains the derivatives with respect to θ, and H is the Hessian with second order derivatives with respect to θ.

This is the second-order Taylor approximation, but we could increase the accuracy by adding even higher-order derivatives.

16. Computational intensityAdam Coates presented a strategy for analysing the speed of matrix operations on a GPU. It’s a simplified model that says your time is spent on either reading/writing to memory or doing calculations. It assumes you can do both in parallel so we are interested in which one of them takes more time.

Let’s say we are multiplying a matrix with a vector:

If M=1024 and N=512, then the number of bytes we need to read and store is:

4

bytes

×

(

1024

×

512

+

512

+

1024

)

=

2.1

e

6

bytes

And the number of calculations we need to do is:

2

×

1024

×

512

=

1

e

6

FLOPs

If we have a GPU that can do 6 TFLOP/s and has memory bandwidth of 300GB/s, then the total running time will be:

This means the process is bounded by the 7μs spent on copying to/from the memory, and getting a faster GPU would not make any difference. As you can probably guess, this situation gets better with bigger matrices/vectors, and when doing matrix-matrix operations.

Adam also described the idea of calculating the intensity of an operation:

Intensity = (# arithmetic ops) / (# bytes to load or store)In the previous scenario, this would be

Intensity = (1E6 FLOPs) / (2.1E6 bytes) = 0.5 FLOPs/bytesLow intensity means the system is bottle necked on memory, and high intensity means it’s bottlenecked by the GPU speed. This can be visualized, in order to find which of the two needs to improve in order to speed up the whole system, and where the sweet spot lies.

17. MinibatchesContinuing from the intensity calculations, one way of increasing the intensity of your network (in order to be limited by computation instead of memory), is to process data in minibatches. This avoids some memory operations, and GPUs are great at processing large matrices in parallel.

However, increasing the batch size too much will probably start to hurting the training algorithm and converging can take longer. It’s important to find a good balance in order to get the best results in the least amount of time.

The noise pattern isn’t random though – the noise is carefully calculated, in order to trick the network. But the point remains: the image on the right is clearly still a goldfish and not a daisy.

Apparently strategies like ensemble models, voting after multiple saccades, and unsupervised pretraining have all failed against this vulnerability. Applying heavy regularisation helps, but not before ruining the accuracy on the clean data.

19. Everything is language modellingPhil Blunsom presented the idea that almost all NLP can be structured as a language model. We can do this by concatenating the output to the input and trying to predict the probability of the whole sequence.

Translation:

Question answering:

Dialogue:

The latter two need to be additionally conditioned on some world knowledge. The second part doesn’t even need to be words, but could be labels or some structured output like dependency relations.

20. SMT had a rough startWhen Frederick Jelinek and his team at IBM submitted one of the first papers on statistical machine translation to COLING in 1988, they got the following anonymous review:

The validity of a statistical (information theoretic) approach to MT has indeed been recognized, as the authors mention, by Weaver as early as 1949. And was universally recognized as mistaken by 1950 (cf. Hutchins, MT – Past, Present, Future, Ellis Horwood, 1986, p. 30ff and references therein). The crude force of computers is not science. The paper is simply beyond the scope of COLING.

21. The State of Neural Machine TranslationApparently a very simple neural model can produce surprisingly good results. An example of translating from Chinese to English, from Phil Blunsom’s slides:

In this model, the vectors for the Chinese words are simply added together to form a sentence vector. The decoder consists of a conditional language model which takes the sentence vector, together with vectors from the two recently generated English words, and generates the next word in the translation.

However, neural models are still not outperforming the very best traditional MT systems. They do come very close though. Results from “Sequence to Sequence Learning with Neural Networks” by Sutskever et al. (2014):

Update: @stanfordnlp pointed out that there are some recent results where the neural model does indeed outperform the state-of-the-art traditional MT system. Check out “Effective Approaches to Attention-based Neural Machine Translation” (Luong et. al., 2023).

22. MetaMind classifier demoRichard Socher demonstrated the MetaMind image classification demo, which you can train yourself by uploading images. I trained a classifier to detect Edison and Einstein (couldn’t find enough unique images of Tesla). 5 example images for both classes, testing on one held out image each. Seemed to work pretty well.

23. Optimising gradient updatesMark Schmidt gave two presentations about numerical optimisation in different scenarios.

In a deterministic gradient method we calculate the gradient over the whole data set and then apply the update. The iteration cost is linear with the data set size.

In stochastic gradient methods we calculate the gradient on one data point and then apply the update. The iteration cost is independent of the data set size.

Each iteration of the stochastic gradient descent is much faster, but it usually takes many more iterations to train the network, as this graph illustrates:

In order to get the best of both worlds, we can use batching. More specifically, we could do one pass of the dataset with stochastic gradient descent, in order to quickly get on the right track, and then start increasing the batch size. The gradient error decreases as the batch size increases, although eventually the iteration cost will become dependent on the dataset size again.

Stochastic Average Gradient (SAG) is a method that gets around this, providing a linear convergence rate with only 1 gradient per iteration. Unfortunately, it is not feasible for large neural networks, as it needs to remember the gradient updates for every datapoint, leading to large memory requirements. Stochastic Variance-Reduced Gradient (SVRG) is another method that reduces this memory cost, and only needs 2 gradient calculations per iteration (plus occasional full passes).

Mark said a student of his implemented a variety of optimisation methods (AdaGrad, momentum, SAG, etc). When asked, what he would use in a black box neural network system, the student said two methods: Streaming SVRG (Frostig et al., 2023), and a method they haven’t published yet.

24. Theano profilingIf you put “profile=True” into THEANO_FLAGS, it will analyse your program, showing a breakdown of how much is spent on each operation. Very handy for finding bottlenecks.

25. Adversarial nets frameworkSystem D is a discriminative system that aims to classify between real data and artificially generated data.

System G is a generative system, that tries to generate artificial data, which D would incorrectly classify as real.

26. chúng tôi numberingThe arXiv number contains the year and month of the submission, followed by the sequence number. So paper 1508.03854 was number 3854 in August 2023. Good to know.

If you like what you just read & want to continue your analytics learning, subscribe to our emails, follow us on twitter or like our facebook page.Related

You're reading __26 Things I Learned In The Deep Learning Summer School__

## Understanding Loss Function In Deep Learning

The loss function is very important in machine learning or deep learning. let’s say you are working on any problem and you have trained a machine learning model on the dataset and are ready to put it in front of your client. But how can you be sure that this model will give the optimum result? Is there a metric or a technique that will help you quickly evaluate your model on the dataset? Yes, here loss functions come into play in machine learning or deep learning. In this article, we will explain everything about loss function in Deep Learning.

This article was published as a part of the Data Science Blogathon.

What is Loss Function in Deep Learning?In mathematical optimization and decision theory, a loss or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some “cost” associated with the event.

In simple terms, the Loss function is a method of evaluating how well your algorithm is modeling your dataset. It is a mathematical function of the parameters of the machine learning algorithm.

In simple linear regression, prediction is calculated using slope(m) and intercept(b). the loss function for this is the (Yi – Yihat)^2 i.e loss function is the function of slope and intercept.

Why Loss Function in Deep Learning is Important?Famous author Peter Druker says You can’t improve what you can’t measure. That’s why the loss function comes into the picture to evaluate how well your algorithm is modeling your dataset.

if the value of the loss function is lower then it’s a good model otherwise, we have to change the parameter of the model and minimize the loss.

Cost Function vs Loss Function in Deep LearningMost people confuse loss function and cost function. let’s understand what is loss function and cost function. Cost function and Loss function are synonymous and used interchangeably but they are different.

Loss FunctionCost FunctionMeasures the error between predicted and actual values in a machine learning model.Quantifies the overall cost or error of the model on the entire training chúng tôi to optimize the model during chúng tôi to guide the optimization process by minimizing the cost or chúng tôi be specific to individual samples.Aggregates the loss values over the entire training set.Examples include mean squared error (MSE), mean absolute error (MAE), and binary cross-entropy.Often the average or sum of individual loss values in the training chúng tôi to evaluate model chúng tôi to determine the direction and magnitude of parameter updates during optimization.Different loss functions can be used for different tasks or problem domains.Typically derived from the loss function, but can include additional regularization terms or other considerations.

Loss Function in Deep Learning

Regression

MSE(Mean Squared Error)

MAE(Mean Absolute Error)

Hubber loss

Classification

Binary cross-entropy

Categorical cross-entropy

AutoEncoder

KL Divergence

GAN

Discriminator loss

Minmax GAN loss

Object detection

Focal loss

Word embeddings

Triplet loss

In this article, we will understand regression loss and classification loss.

A. Regression Loss 1. Mean Squared Error/Squared loss/ L2 lossThe Mean Squared Error (MSE) is the simplest and most common loss function. To calculate the MSE, you take the difference between the actual value and model prediction, square it, and average it across the whole dataset.

Advantage

1. Easy to interpret.

2. Always differential because of the square.

3. Only one local minima.

1. Error unit in the square. because the unit in the square is not understood properly.

2. Not robust to outlier

Note – In regression at the last neuron use linear activation function.

2. Mean Absolute Error/ L1 lossThe Mean Absolute Error (MAE) is also the simplest loss function. To calculate the MAE, you take the difference between the actual value and model prediction and average it across the whole dataset.

Advantage

1. Intuitive and easy

2. Error Unit Same as the output column.

3. Robust to outlier

1. Graph, not differential. we can not use gradient descent directly, then we can subgradient calculation.

Note – In regression at the last neuron use linear activation function.

3. Huber LossIn statistics, the Huber loss is a loss function used in robust regression, that is less sensitive to outliers in data than the squared error loss.

n – the number of data points.

y – the actual value of the data point. Also known as true value.

ŷ – the predicted value of the data point. This value is returned by the model.

δ – defines the point where the Huber loss function transitions from a quadratic to linear.

Advantage

Robust to outlier

It lies between MAE and MSE.

B. Classification Loss 1. Binary Cross Entropy/log loss

It is used in binary classification problems like two classes. example a person has covid or not or my article gets popular or not.

Binary cross entropy compares each of the predicted probabilities to the actual class output which can be either 0 or 1. It then calculates the score that penalizes the probabilities based on the distance from the expected value. That means how close or far from the actual value.

yi – actual values

yihat – Neural Network prediction

Advantage –

A cost function is a differential.

Multiple local minima

Not intuitive

Note – In classification at last neuron use sigmoid activation function.

2. Categorical Cross EntropyCategorical Cross entropy is used for Multiclass classification and softmax regression.

loss function = -sum up to k(yjlagyjhat) where k is classes

cost function = -1/n(sum upto n(sum j to k (yijloghijhat))

where

k is classes,

y = actual value

yhat – Neural Network prediction

Note – In multi-class classification at the last neuron use the softmax activation function.

if problem statement have 3 classes

softmax activation – f(z) = ez1/(ez1+ez2+ez3)

When to use categorical cross-entropy and sparse categorical cross-entropy?If target column has One hot encode to classes like 0 0 1, 0 1 0, 1 0 0 then use categorical cross-entropy. and if the target column has Numerical encoding to classes like 1,2,3,4….n then use sparse categorical cross-entropy.

Which is Faster?sparse categorical cross-entropy faster than categorical cross-entropy.

ConclusionIn this article, we learned about different types of loss functions. The key takeaways from the article are:

We learned the importance of loss function in deep learning.

Difference between loss and cost.

The mean absolute error is robust to the outlier.

This function is used for binary classification.

Sparse categorical cross-entropy is faster than categorical cross-entropy.

So, this was all about loss functions in deep learning. Hope you liked the article.

Frequently Asked QuestionsQ1. What is a loss function?

A. A loss function is a mathematical function that quantifies the difference between predicted and actual values in a machine learning model. It measures the model’s performance and guides the optimization process by providing feedback on how well it fits the data.

Q2. What is loss and cost function in deep learning?

A. In deep learning, “loss function” and “cost function” are often used interchangeably. They both refer to the same concept of a function that calculates the error or discrepancy between predicted and actual values. The cost or loss function is minimized during the model’s training process to improve accuracy.

Q3. What is L1 loss function in deep learning?

A. L1 loss function, also known as the mean absolute error (MAE), is commonly used in deep learning. It calculates the absolute difference between predicted and actual values. L1 loss is robust to outliers but does not penalize larger errors as strongly as other loss functions like L2 loss.

Q4. What is loss function in deep learning for NLP?

A. In deep learning for natural language processing (NLP), various loss functions are used depending on the specific task. Common loss functions for tasks like sentiment analysis or text classification include categorical cross-entropy and binary cross-entropy, which measure the difference between predicted and true class labels for classification tasks in NLP.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Related

## Today I Learned: The Smell Of Formaldehyde Makes You Hungry

Today I Learned: The Smell of Formaldehyde Makes You Hungry

Today I Learned

Listen Now

The Smell of Formaldehyde Makes You Hungry In the first episode of an all-new BU Today podcast, we sit down with sophomore Suzie Marcus to discuss her favorite class, HS 369: Gross Human AnatomyYou can also find this episode on Apple Podcasts, Spotify, Google Podcasts, and other podcast platforms.

The classes we take can change our perspectives and shape our lives, and we think that’s something worth celebrating. Our new podcast, Today I Learned, is all about the classes at BU that have had a real effect on students in our community; we want to know all about the classroom environment, professor, subject matter, and the cool facts that make a lasting impression.

For our first episode, we’ve invited Suzie Marcus (CAS’26) to tell us all about her favorite class, Sargent HS 369: Gross Human Anatomy. Marcus wants to go into neurology, but she says she loves the broad approach to the human body that Elizabeth Co, a senior lecturer at the College of Arts & Sciences, presents in class. This class is half lecture and half lab (featuring lots of cadavers and human bones) and, according to Marcus, is perfect for anyone thinking about nursing, premed, veterinary studies, biomedical engineering, or “anyone who has a fascination with science and they aren’t squeamish.”

Want to be our next guest? Tell us about your favorite class here. Undergraduates, graduate students, and recent postgraduates are welcome to submit.

Takeaways

Cadaver labs aren’t for the faint of heart—but most premed students know what to expect, and squeamishness isn’t an issue. (It helps if you watch surgery videos on YouTube in your spare time.)

What happens when a muscle tears, a bone fractures, or a tumor appears? Elizabeth Co, who teaches Marcus and her classmates, wants her students to always consider the “downstream effects” when engaging in case studies.

The smell of formaldehyde can trigger a hunger effect in some people, which is why it’s good to have a coffee shop nearby the lab.

TranscriptSophie Yarin: Hello everyone and welcome to Today I Learned, a BU Today podcast where we explore fun facts and ideas across a multitude of disciplines. We’ll be interviewing students about exciting things they learned in their favorite classes at BU. From changing majors to picking career paths, single classes can have a transformative impact on their future. I’m your host, Sophie Yarin, and I’m investigating how the things we learned in the classroom affect our lives. So, to do that, we’re going to be speaking directly to BU students, which is why we have Suzie Marcus joining us in the studio today. Suzie, thank you so much for being here.

Suzie Marcus: Thanks for having me.

Yarin: So, Suzie, you’re a premed student and a sophomore at the College of Arts & Sciences. Tell us a little bit about yourself. Where are you from?

Marcus: I’m from Bedford, Massachusetts, which is about 45 minutes northwest of here. I’m studying neuroscience, and I definitely want to go into something related to neurology or neurosurgery—just something related to the brain.

Yarin: Got it. So, why did you choose that major?

Marcus: I didn’t always know what I wanted to be. I knew I wanted to do something STEM-related, but I actually used to be really squeamish. Obviously, I’m not any longer. I kind of discovered a fascination with the brain, and how we think, and then from there was kind of just reading things about the brain or watching their neurosurgery videos on YouTube—lots of late nights being interested in aneurysm clippings—and it kind of just turned into, “I want to know more about that.” That’s why I’m so fascinated with the brain. So, I said, “I have to major in neuroscience.”

Yarin: So, the class you’ve chosen to focus on today is Sargent HS 369: Gross Human Anatomy with Dr. Elizabeth Co. And what’s something you learned in that class that stuck with you?

Marcus: I think everybody knows when they’re signing up for the class that the lab is a cadaver lab, and certain people might have certain reactions to the cadaver. But from my experience, I don’t think that anyone really had a, you know, a feeling nauseous or anything. Because, again, I think people going into it know. I’ve also had some experience prior working in a morgue, so I was kind of used to the formaldehyde smell. And I would say that it can be easier or more difficult, depending on how you learn, because, you know, you kind of take for granted the red of the arteries and the yellow of the nerves in all those diagrams. But sometimes the diagrams, like cross sections of the knee, they don’t really make that much sense to me—honestly, they probably never will. But seeing it on a person, that makes sense. And being able to—of course, we have gloves on—being able to, you know, touch something or maybe pull something back to be able to see the more deep structures of, you know, a leg or an arm, that makes more sense. But, at the same time, everything’s the same color. So, it’s very difficult to tell, oh, is this connective tissue? Is this actually a nerve? But I think that being able to see both the diagrams in lecture, and then the physical anatomy and lab, it goes together very well.

Yarin: All right, so we asked you to prepare a fun fact to bring in on the show today. And why don’t you tell us what that fact is?

Marcus: Formaldehyde is an appetite stimulant in like 20 percent of people, which is kind of creepy, but then I thought, you know, after the lab my labmates and I did like to go to Starbucks and, you know, grab a bagel.

Yarin: How would you say that Dr. Co approaches the subject matter? The matter, literally.

Marcus: She brings bones to class, sometimes. They’re very clean—they’re not, like, in formaldehyde or anything. We learned osteology or just, you know, the parts of the bone in lab, but we also go over it in lecture. And so a lot of times she’ll bring different types of bones, like you could have a humerus, which is in your arm, or a femur, or part of a pelvis, and she’ll show us that. And I just remember, sometimes she would have the LAs [lab assistants] hand them out, so every couple of people, you would be like, “Oh, here’s this, this is like a scapula, which is the part in your back.” And she would kind of be like, “Okay, well, if you look at this side of it, you know, you have like a tubercle, like a bump, here,” and then you could feel it. So, that’s one way that she does, you know, more hands-on things because, for the bones, we do a lot of palpations, like feeling stuff on yourself, which is definitely very helpful. But you can’t really feel everything just by, you know, pressing on your arm. So, that definitely really helps. And then she also has case studies, which are huge for understanding things.

Yarin: What’s your favorite part of her teaching style? What do you appreciate the most?

Marcus: I think that she really has an enthusiasm for it, especially when she’s like, “I have my big box of bones,” and she’ll hand them out. And she definitely tells us a lot of fun facts about a lot of things that are not something that you would expect to learn. I also think that something that she really emphasizes a lot in both the gross human anatomy and her physiology class that I took are misconceptions, or just things that people think that they know about human anatomy and physiology that are actually just simply wrong. And she’ll explain, “If you heard that, that’s not completely true, it was actually this.”

Yarin: So, it sounds like this is going to be probably one of the most comprehensive biology, health science classes.

Marcus: We learn everything. Everything. It’s definitely a lot of material, but it’s totally worth it.

Yarin: Do you see anything that resonates with neurology and neuroscience?

Yarin: Has anything outside of neurology and nerves study sort of made you go, I like this, I’m enjoying learning about this.

Marcus: I think that I enjoy learning about kind of everything, you know, it’s just a lot that we learned about that, I found really interesting. I’ve always been really interested in the brain, but I became more interested in nerves, and kind of like your spinal cord. I remember, this was a random fact that she talked about that had nothing to do with the unit that we were about to start, and she talked about shingles really quickly. And it was kind of an unrelated note, but that’s something else that stuck with me about how, you know, the reason it’s called shingles is because I guess the rash appears, and [it’s] kind of like a very straight line that looks like a shingle, like a roofing shingle. The reason that you see it in the shingle in that line is because that’s where the nerves innervate. So, if you have, you know, just a line of that rash on your stomach, you can see the line of the nerve, which is crazy to me, like, I had no clue, I thought the brain was, you know, the end-all-be-all but, no. The nerves are also really cool.

Yarin: So, tell me a little bit about these cadaver labs and sort of walk me through a lab you had recently.

Marcus: So, I had an 8 am lab. And so, you know, they’re bright and early, that’s definitely something that people might have to get used to. You walk in, and usually, you know, you put your stuff down in the corner. And usually, we would start with osteology. And another thing that I thought was really interesting about the way that the class is structured is we didn’t have a professor there with us. It was two LAs, they were both seniors, they’re very nice. First, we would sit down, there’s regular, science table desks, that you would imagine. And sometimes there are bones laying out—clean ones, we don’t have to wear gloves, they’re just [like] if you would imagine a dog chew toy. And they would go over osteology, where we would have diagrams, either on like a computer, or like I brought my iPad, and you’d have to label them and they’ll be part of our post-lab assignment. They would go over the different parts of the bone, like, “This is, like, the inferior part of it. This bump means this,” and then, usually, we would move into the cadaver part of it. And these labs are very small, I think there were seven people, including myself. So, it’s definitely not a big lab, which makes sense.

And there are two cadavers: one is in the supine position, which is laying on your back. And one is in the prone position, which is laying on your stomach facedown, so that you don’t have to move them back and forth to see the structures on both sides. So, we have some sheets that are laminated, of course, we have gloves on and some, like, pointers that are metal and you can kind of put your pointer underneath this nerve and be like, “Look at this nerve, look at this tendon, this is that,” and we look at both the prone and supine to be able to see both the front and back view of the arm or the leg. We kind of just went right into it. And so the LAs, they would show us something, they’d be like, “Okay, look at this, this is like this muscle on your back.” You can pull back muscles because there are obviously more like superficial and deep muscles. And then they would be wearing gloves where we can look and touch and you can just gently pull back the muscle. I think everyone took a pause to be like, “Okay, like we can touch this,” but I think that it felt very lab science.

Yarin: Sounds like it’s safe to say that you’re not the only student in the class who’s watched surgery videos on YouTube.

Marcus: Yeah, I don’t think so. I think that we definitely stood farther back from the cadavers in the first lab. But then, by the time we were at our fifth lab, we were kind of like, “Oh, can I reach over here? Like, what does this tendon connect to? Like, oh, is this a nerve? Or what is this?” Any lab, I think, is something that you get used to, but it’s definitely different. Like, I don’t think I can compare it to running PCRs or doing anything else in the bio lab. It’s very unique.

Yarin: So, if you were to recommend this class, first of all, who would you recommend this class to? And, second of all, how would you describe it to them?

Marcus: I would recommend this class, definitely to anybody who wants to go into healthcare, whether that be nursing or premed, like myself, PA, maybe veterinary, you know, I know, it’s not an animal, but, you know, muscles are muscles. And the way that things connect, are very similar between mammals. I would also say, I was thinking about this, I feel like biomedical engineering students might like this, because I don’t know much about biomedical engineering. My mom’s one, so I probably should know, but we learned so much about muscles and the way that we move, and I think that if somebody is aspiring to help create something that has to do with mobility, that this is such a perfect way to learn, you know? How do we move? How are we able to rotate our arms? Or how are we able to stand on [our] tippy-toes, everything like that? You know, you can learn about that in class, but until you actually see it, and you can think, “If I pull on this bone, how is it moving? And what muscles?” That’s something that I think, you know, if they get a chance—I know they’re very busy—but if biomedical engineering students get a chance, I think they should absolutely take this class.

I think the way I would describe it is, I feel like it’s honestly two different classes that connect very well. Because in one aspect, you have the cadaver lab and so you’re there and you’re like, “Okay, we’re gonna learn about bones and, you know, where is this really in the body?” And you have lecture, where you’ll do case studies, like, “What happens if this goes wrong?” That’s the biggest thing. Like, what happens if this tendon tears? That’s what Dr. Co said—the best way to learn things is to think, “Okay, if this goes wrong, what are the downstream effects?” And that’s so true. That’s how I studied for every exam. So, I think that you get the more hypothetical in lecture, which is kind of like diagnosing, which I think that if somebody’s prehealth, that’s another really fun thing to do, and then you get more of the physical in lab. So, I think they’re two separate classes that connect very well, is what I would say.

Yarin: Suzie, thank you so much for sitting down with us for our inaugural episode. It was a pleasure to talk to you. And we wish you all the best with your upcoming labs, however many there may be.

Marcus: [laughs] Probably a lot.

Yarin: So, thanks for tuning in to Today I Learned, a BU Today podcast. Do you have a favorite class you think we should know about? Tell us all about it by filling out the form linked in our description. Today I Learned is produced and engineered by Andrew Hallock and edited and hosted by me, Sophie Yarin. We’ll see you next time.

## Car Price Prediction – Machine Learning Vs Deep Learning

This article was published as a part of the Data Science Blogathon

1. ObjectiveIn this article, we will be predicting the prices of used cars. We will be building various Machine Learning models and Deep Learning models with different architectures. In the end, we will see how machine learning models perform in comparison to deep learning models.

2. Data UsedHere we have used the data from a hiring competition that was live on chúng tôi Use the below link to access the data and use it for your analysis.

3. Data InspectionIn this section, we will explore the data. First Let’s see what columns we have in the data and their data types along with missing values information.

We can observe that data have 19237 rows and 18 columns.

There are 5 numeric columns and 13 categorical columns. With the first look, we can see that there are no missing values in the data.

‘Price‘ column/feature is going to be the target column or dependent feature for this project.

Let’s see the distribution of the data.

4. Data PreparationHere we will clean the data and prepare it for training the model.

‘ID’ columnWe are dropping the ‘ID’ column since it does not hold any significance for car Price prediction.

df.drop('ID',axis=1,inplace=True) ‘Levy’ columnAfter analyzing the ‘Levy’ column we found out that it does contain the missing values but it was given as ‘-‘ in the data and that’s why we were not able to capture the missing values earlier in the data.

Here we will impute ‘-‘ in the ‘Levy’ column with ‘0’ assuming there was no ‘Levy’. We can also impute it with ‘mean’ or ‘median’, but that’s a choice that you have to make.

df['Levy']=df['Levy'].replace('-',np.nan) df['Levy']=df['Levy'].astype(float) levy_mean=0 df['Levy'].fillna(levy_mean,inplace=True) df['Levy']=round(df['Levy'],2) ‘Mileage’ column‘Mileage’ column here means how many kilometres the car has driven. ‘km’ is written in the column after each reading. We will remove that.

#since milage is in KM only we will remove 'km' from it and make it numerical df['Mileage']=df['Mileage'].apply(lambda x:x.split(' ')[0]) df['Mileage']=df['Mileage'].astype('int') ‘Engine Volume’ columnIn the ‘Engine Volumn’ column along with the Engine Volumn ‘type’ of the engine(Turbo or not Turbo) is also written. We will create a new column that shows the ‘type’ of ‘Engine’.

df['Turbo']=df['Engine volume'].apply(lambda x:1 if 'Turbo' in str(x) else 0) df['Engine volume']=df['Engine volume'].apply(lambda x:str(x).replace('Turbo','')) df['Engine volume']=df['Engine volume'].astype(float) ‘Doors’ Column df['Doors'].unique()Output:

‘Doors’ column represents the number of doors in the car. But as we can see it is not clean. Let’s clean

Handling ‘Outliers’This we will examine across numerical features.

cols=['Levy','Engine volume', 'Mileage','Cylinders','Airbags'] sns.boxplot(df[cols[0]]); sns.boxplot(df[cols[1]]); sns.boxplot(df[cols[2]]); sns.boxplot(df[cols[3]]); sns.boxplot(df[cols[4]]);

As we can see there are outliers in ‘Levy’,’Engine volume’, ‘Mileage’, ‘Cylinders’ columns. We will remove these outliers using Inter Quantile Range(IQR) method.

def find_outliers_limit(df,col): print(col) print('-'*50) #removing outliers q25, q75 = np.percentile(df[col], 25), np.percentile(df[col], 75) iqr = q75 - q25 print('Percentiles: 25th=%.3f, 75th=%.3f, IQR=%.3f' % (q25, q75, iqr)) # calculate the outlier cutoff cut_off = iqr * 1.5 lower, upper = q25 - cut_off, q75 + cut_off print('Lower:',lower,' Upper:',upper) return lower,upper def remove_outlier(df,col,upper,lower): # identify outliers outliers = [x for x in df[col] if x upper] print('Identified outliers: %d' % len(outliers)) # remove outliers print('Non-outlier observations: %d' % len(outliers_removed)) return final outlier_cols=['Levy','Engine volume','Mileage','Cylinders'] for col in outlier_cols: lower,upper=find_outliers_limit(df,col) df[col]=remove_outlier(df,col,upper,lower)Let’s examine the features after removing outliers.

plt.figure(figsize=(20,10)) df[outlier_cols].boxplot()We can observe that there are no outliers in the features now.

Creating Additional FeaturesWe see that ‘Mileage’ and ‘Engine Volume’ are continuous variables. While performing regression I have observed that binning such variables can help increase the performance of the model. So I am creating the ‘Bin’ features for these features/columns.

labels=[0,1,2,3,4,5,6,7,8,9] df['Mileage_bin']=pd.cut(df['Mileage'],len(labels),labels=labels) df['Mileage_bin']=df['Mileage_bin'].astype(float) labels=[0,1,2,3,4] df['EV_bin']=pd.cut(df['Engine volume'],len(labels),labels=labels) df['EV_bin']=df['EV_bin'].astype(float) Handling Categorical featuresI have used Ordinal Encoder to handle the categorical columns. OrdinalEncoder works similar to LabelEncoder but OrdinalEncoder can be applied to multiple features while LabelEncoder can be applied to One feature at a time. For more details please visit the below links

num_df=df.select_dtypes(include=np.number) cat_df=df.select_dtypes(include=object) encoding=OrdinalEncoder() cat_cols=cat_df.columns.tolist() encoding.fit(cat_df[cat_cols]) cat_oe=encoding.transform(cat_df[cat_cols]) cat_oe=pd.DataFrame(cat_oe,columns=cat_cols) cat_df.reset_index(inplace=True,drop=True) cat_oe.head() num_df.reset_index(inplace=True,drop=True) cat_oe.reset_index(inplace=True,drop=True) final_all_df=pd.concat([num_df,cat_oe],axis=1)Checking correlation

final_all_df['price_log']=np.log(final_all_df['Price'])We can observe that features are not much correlated in the data. But there is one thing that we can notice is that after log transforming ‘Price’ column, correlation with few features got increased which is a good thing. We will be using log-transformed ‘Price’ to train the model. Please visit mentioned link below to better understand how feature transformations help improve model performance.

5. Data Splitting and ScalingWe have done an 80-20 split on the data. 80% of the data will be used for training and 20% data will be used for testing.

We will also scale the data since feature values in data do not have the same scale and having different scales can produce poor model performance.

cols_drop=['Price','price_log','Cylinders'] X=final_all_df.drop(cols_drop,axis=1) y=final_all_df['Price'] X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=25) scaler=StandardScaler() X_train_scaled=scaler.fit_transform(X_train) X_test_scaled=scaler.transform(X_test) 6. Model BuildingWe built LinearRegression, XGBoost, and RandomForest as machine learning models and two deep learning models one having a small network and another having a large network.

We built base models of LinearRegression, XGBoost, and RandomForest so there is not much to show about these models but we can see the model summary and how they converge with deep learning models that we built.

Deep Learning Model – Small Network model summary model_dl_small.summary() Deep Learning Model – Small Network _Train & Validation Loss #plot the loss and validation loss of the dataset history_df = pd.DataFrame(model_dl_small.history.history) plt.figure(figsize=(20,10)) plt.plot(history_df['loss'], label='loss') plt.plot(history_df['val_loss'], label='val_loss') plt.xticks(np.arange(1,epochs+1,2)) plt.yticks(np.arange(1,max(history_df['loss']),0.5)) plt.legend() plt.grid() Deep Learning Model – Large Network model summary model_dl_large.summary() Deep Learning Model – Large Network _Train & Validation Loss #plot the loss and validation loss of the dataset history_df = pd.DataFrame(model_dl_large.history.history) plt.figure(figsize=(20,10)) plt.plot(history_df['loss'], label='loss') plt.plot(history_df['val_loss'], label='val_loss') plt.xticks(np.arange(1,epochs+1,2)) plt.yticks(np.arange(1,max(history_df['loss']),0.5)) plt.legend() plt.grid()6.1 Model Performance

We have evaluated the models using Mean_Squared_Error, Mean_Absolute_Error, Mean_Absolute_Percentage_Error, Mean_Squared_Log_Error as performance matrices, and below are the results we got.

We can observe that Deep Learning Model did not perform well in comparison with Machine Learning Models. RandomForest performed really well among Machine Learning Model.

Let’s visualize the results from Random Forest.

7. Result Visualization y_pred=np.exp(model_rf.predict(X_test_scaled)) number_of_observations=20 x_ax = range(len(y_test[:number_of_observations])) plt.figure(figsize=(20,10)) plt.plot(x_ax, y_test[:number_of_observations], label="True") plt.plot(x_ax, y_pred[:number_of_observations], label="Predicted") plt.title("Car Price - True vs Predicted data") plt.xlabel('Observation Number') plt.ylabel('Price') plt.xticks(np.arange(number_of_observations)) plt.legend() plt.grid() plt.show()We can observe in the graph that the model is performing really well as seen in performance matrices as well.

8. CodeCode was done on jupyter notebook. Below is the complete code for the project.

# Loading Libraries import pandas as pd import numpy as np from sklearn.preprocessing import OrdinalEncoder from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_log_error,mean_squared_error,mean_absolute_error,mean_absolute_percentage_error import datetime from sklearn.ensemble import RandomForestRegressor from sklearn.linear_model import LinearRegression from xgboost import XGBRegressor from sklearn.preprocessing import StandardScaler import matplotlib.pyplot as plt import seaborn as sns from keras.models import Sequential from keras.layers import Dense from prettytable import PrettyTable df=pd.read_csv('../input/Participant_Data_TheMathCompany_.DSHH/train.csv') df.head() # Data Inspection df.shape df.describe().transpose() df.info() sns.pairplot(df, diag_kind='kde') # Data Preprocessing df.drop('ID',axis=1,inplace=True) df['Levy']=df['Levy'].replace('-',np.nan) df['Levy']=df['Levy'].astype(float) levy_mean=0 df['Levy'].fillna(levy_mean,inplace=True) df['Levy']=round(df['Levy'],2) milage_formats=set() def get_milage_format(x): x=x.split(' ')[1] milage_formats.add(x) df['Mileage'].apply(lambda x:get_milage_format(x)); milage_formats #since milage is in KM only we will remove 'km' from it and make it numerical df['Mileage']=df['Mileage'].apply(lambda x:x.split(' ')[0]) df['Mileage']=df['Mileage'].astype('int') df['Engine volume'].unique() df['Turbo']=df['Engine volume'].apply(lambda x:1 if 'Turbo' in str(x) else 0) df['Engine volume']=df['Engine volume'].apply(lambda x:str(x).replace('Turbo','')) df['Engine volume']=df['Engine volume'].astype(float) cols=['Levy','Engine volume', 'Mileage','Cylinders','Airbags'] sns.boxplot(df[cols[0]]); cols=['Levy','Engine volume', 'Mileage','Cylinders','Airbags'] sns.boxplot(df[cols[1]]); cols=['Levy','Engine volume', 'Mileage','Cylinders','Airbags'] sns.boxplot(df[cols[2]]); cols=['Levy','Engine volume', 'Mileage','Cylinders','Airbags'] sns.boxplot(df[cols[3]]); cols=['Levy','Engine volume', 'Mileage','Cylinders','Airbags'] sns.boxplot(df[cols[4]]); def find_outliers_limit(df,col): print(col) print('-'*50) #removing outliers q25, q75 = np.percentile(df[col], 25), np.percentile(df[col], 75) iqr = q75 - q25 print('Percentiles: 25th=%.3f, 75th=%.3f, IQR=%.3f' % (q25, q75, iqr)) # calculate the outlier cutoff cut_off = iqr * 1.5 lower, upper = q25 - cut_off, q75 + cut_off print('Lower:',lower,' Upper:',upper) return lower,upper def remove_outlier(df,col,upper,lower): # identify outliers outliers = [x for x in df[col] if x upper] print('Identified outliers: %d' % len(outliers)) # remove outliers print('Non-outlier observations: %d' % len(outliers_removed)) return final outlier_cols=['Levy','Engine volume','Mileage','Cylinders'] for col in outlier_cols: lower,upper=find_outliers_limit(df,col) df[col]=remove_outlier(df,col,upper,lower) #boxplot - to see outliers plt.figure(figsize=(20,10)) df[outlier_cols].boxplot() df['Doors'].unique() df['Doors']=df['Doors'].astype(str) #Creating Additional Features labels=[0,1,2,3,4,5,6,7,8,9] df['Mileage_bin']=pd.cut(df['Mileage'],len(labels),labels=labels) df['Mileage_bin']=df['Mileage_bin'].astype(float) labels=[0,1,2,3,4] df['EV_bin']=pd.cut(df['Engine volume'],len(labels),labels=labels) df['EV_bin']=df['EV_bin'].astype(float) #Handling Categorical features num_df=df.select_dtypes(include=np.number) cat_df=df.select_dtypes(include=object) encoding=OrdinalEncoder() cat_cols=cat_df.columns.tolist() encoding.fit(cat_df[cat_cols]) cat_oe=encoding.transform(cat_df[cat_cols]) cat_oe=pd.DataFrame(cat_oe,columns=cat_cols) cat_df.reset_index(inplace=True,drop=True) cat_oe.head() num_df.reset_index(inplace=True,drop=True) cat_oe.reset_index(inplace=True,drop=True) final_all_df=pd.concat([num_df,cat_oe],axis=1) #Checking correlation final_all_df['price_log']=np.log(final_all_df['Price']) plt.figure(figsize=(20,10)) sns.heatmap(round(final_all_df.corr(),2),annot=True); cols_drop=['Price','price_log','Cylinders'] final_all_df.columns X=final_all_df.drop(cols_drop,axis=1) y=final_all_df['Price'] # Data Splitting and Scaling X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=25) scaler=StandardScaler() X_train_scaled=scaler.fit_transform(X_train) X_test_scaled=scaler.transform(X_test) # Model Building def train_ml_model(x,y,model_type): if model_type=='lr': model=LinearRegression() elif model_type=='xgb': model=XGBRegressor() elif model_type=='rf': model=RandomForestRegressor() model.fit(X_train_scaled,np.log(y)) return model def model_evaluate(model,x,y): predictions=model.predict(x) predictions=np.exp(predictions) mse=mean_squared_error(y,predictions) mae=mean_absolute_error(y,predictions) mape=mean_absolute_percentage_error(y,predictions) msle=mean_squared_log_error(y,predictions) mse=round(mse,2) mae=round(mae,2) mape=round(mape,2) msle=round(msle,2) return [mse,mae,mape,msle] model_lr=train_ml_model(X_train_scaled,y_train,'lr') model_xgb=train_ml_model(X_train_scaled,y_train,'xgb') model_rf=train_ml_model(X_train_scaled,y_train,'rf') ## Deep Learning ### Small Network model_dl_small=Sequential() model_dl_small.add(Dense(16,input_dim=X_train_scaled.shape[1],activation='relu')) model_dl_small.add(Dense(8,activation='relu')) model_dl_small.add(Dense(4,activation='relu')) model_dl_small.add(Dense(1,activation='linear')) model_dl_small.summary() epochs=20 batch_size=10 model_dl_small.fit(X_train_scaled,np.log(y_train),verbose=0,validation_data=(X_test_scaled,np.log(y_test)),epochs=epochs,batch_size=batch_size) #plot the loss and validation loss of the dataset history_df = pd.DataFrame(model_dl_small.history.history) plt.figure(figsize=(20,10)) plt.plot(history_df['loss'], label='loss') plt.plot(history_df['val_loss'], label='val_loss') plt.xticks(np.arange(1,epochs+1,2)) plt.yticks(np.arange(1,max(history_df['loss']),0.5)) plt.legend() plt.grid() ### Large Network model_dl_large=Sequential() model_dl_large.add(Dense(64,input_dim=X_train_scaled.shape[1],activation='relu')) model_dl_large.add(Dense(32,activation='relu')) model_dl_large.add(Dense(16,activation='relu')) model_dl_large.add(Dense(1,activation='linear')) model_dl_large.summary() epochs=20 batch_size=10 model_dl_large.fit(X_train_scaled,np.log(y_train),verbose=0,validation_data=(X_test_scaled,np.log(y_test)),epochs=epochs,batch_size=batch_size) #plot the loss and validation loss of the dataset history_df = pd.DataFrame(model_dl_large.history.history) plt.figure(figsize=(20,10)) plt.plot(history_df['loss'], label='loss') plt.plot(history_df['val_loss'], label='val_loss') plt.xticks(np.arange(1,epochs+1,2)) plt.yticks(np.arange(1,max(history_df['loss']),0.5)) plt.legend() plt.grid() summary=PrettyTable(['Model','MSE','MAE','MAPE','MSLE']) summary.add_row(['LR']+model_evaluate(model_lr,X_test_scaled,y_test)) summary.add_row(['XGB']+model_evaluate(model_xgb,X_test_scaled,y_test)) summary.add_row(['RF']+model_evaluate(model_rf,X_test_scaled,y_test)) summary.add_row(['DL_SMALL']+model_evaluate(model_dl_small,X_test_scaled,y_test)) summary.add_row(['DL_LARGE']+model_evaluate(model_dl_large,X_test_scaled,y_test)) print(summary) y_pred=np.exp(model_rf.predict(X_test_scaled)) number_of_observations=20 x_ax = range(len(y_test[:number_of_observations])) plt.figure(figsize=(20,10)) plt.plot(x_ax, y_test[:number_of_observations], label="True") plt.plot(x_ax, y_pred[:number_of_observations], label="Predicted") plt.title("Car Price - True vs Predicted data") plt.xlabel('Observation Number') plt.ylabel('Price') plt.xticks(np.arange(number_of_observations)) plt.legend() plt.grid() plt.show() 9.ConclusionIn this article, we tried predicting the car price using the various parameters that were provided in the data about the car. We build machine learning and deep learning models to predict car prices and saw that machine learning-based models performed well at this data than deep learning-based models.

10. About the AuthorHi, I am Kajal Kumari. I have completed my Master’s from IIT(ISM) Dhanbad in Computer Science & Engineering. As of now, I am working as Machine Learning Engineer in Hyderabad. You can also check out few other blogs that I have written here.

The media shown in this article on LSTM for Human Activity Recognition are not owned by Analytics Vidhya and are used at the Author’s discretion.

Related

## The Weirdest Things We Learned This Week: Counting Vampires And Nudist Founding Fathers

What’s the weirdest thing you learned this week? Well, whatever it is, we promise you’ll have an even weirder answer if you listen to PopSci’s hit podcast. The Weirdest Thing I Learned This Week hits iTunes, Anchor, and everywhere else you listen to podcasts every Wednesday morning. It’s your new favorite source for the strangest science-adjacent facts, figures, and Wikipedia spirals the editors of Popular Science can muster. If you like the stories in this post, we guarantee you’ll love the show.

FACT: Benjamin Franklin liked to sit around nakedBy Rachel Feltman

Air bathing is exactly what it sounds like: It’s like bathing, but with air. Of course, one could also refer to air bathing as “sitting around in the nude,” and they definitely wouldn’t be wrong. One famous proponent of this practice was none other than Benjamin Franklin. Here’s a paper on the subject written in the early 1900s, featuring excerpts from the founding father’s pro-nude-naptime letters.

I hoped to uncover the true health benefits of sitting around sans clothing—there’s no good research on the subject, by the way—but as I discuss in this week’s episode of the show, I came across a surprising twist. And yes: just like the last surprising twist I found, it involves Nazis. They’re everywhere! Go figure. You can read more about Joe Knowles, whose time spent living naked and alone (allegedly) in the wilderness showcases the creepy rhetoric of the American eugenics movement, in this Boston Magazine retrospective. I also mention his hilariously laudatory book on the so-called experiment, which you can peruse for yourself for tidbits about how exposure to the elements would help American men maintain the strength of the race. And if that connection to eugenics and Nazism is still a little too subtle for you, read up on how the Nazi party co-opted the tenants of the nudist movement (while disavowing anything as liberal as actually running around nude) here.

To end on the lightest possible note (sorry!) here is a photo I found of Eugen Sandow, largely considered the father of modern bodybuilding. Knowles, pictured above, claims his measurements were comparable to the famous athlete’s stats after his stint in the woods. This will make you laugh.

FACT: Vampires are compulsive counters (and other bizarre vampire facts)By Jessica Boddy

I recently caught up with a former history professor of mine, Theodora Kelly Trimble from the University of Pittsburgh, and she reminded me of a rather grim figure from the 16th and 17th centuries: Elizabeth Bathory. Bathory was a Hungarian countess who was quite possibly the most deadly serial killer of all time—she’s thought to have tortured and killed over 600 young women. Her murders were rather gory, and some saw her bathing in the blood of her victims, an act thought to preserve her youth and beauty. This led people to believe that she was a vampire. (She was even born in Transylvania!) Bathory was tried and charged for her crimes, and as punishment, the Hungarian people built a wall to barricade her into her castle’s tower. There was a little slot for air and food to pass through, but other than that, she could not leave.

Researching this got me thinking about just how weird the culture surrounding vampire lore is, so I decided to check out my notes from Kelly’s class (which was about vampire myths throughout history and across cultures). Among my favorite facts is a classic vampire burial technique: sprinkle poppy seeds on a supposed vampire’s grave, and he or she will never come to hunt you. This is because vampires are, supposedly, compulsive counters, and would spend all night counting the poppy seeds you left. I also brushed up on porphyria, a disease that causes sun sensitivity and receding gums, making its sufferers look very pale and have very large teeth. Though not overly common today, porphyria was thought to occur more in individuals who repeatedly married within their families—like, for instance, in small villages in the valleys of Transylvania.

FACT: Cutting up hot peppers can give you a condition called jalapeño handsBy Claire Maldarelli

A while back, as I was catching up with my sister over the phone, she mentioned to me something odd: Shortly after cutting up hot peppers for a recipe, her thumb started burning and wouldn’t stop. After googling her symptoms, she found that, in fact, this is a not-so-uncommon condition that in the medical literature goes by many names, including but not limited to: hot pepper hands, jalapeño hands, jalapeño thumb, and Hunan hand syndrome.

As a health editor and self-proclaimed hypochondriac, I was surprised and interested (a disease I had never heard of!). Turns out, the culprit is capsaicin, a chemical compound found in the fruit of plants within the capsicum family, including red chili peppers, jalapeños, and habaneros. This colorless and odorless compound binds to pain receptors, triggering the sensation of intense heat or burning. But I won’t ruin all the surprises that come with jalapeño hands. Listen to this week’s episode for more about this bizarre reaction to hot peppers, how to treat it, some crazy case studies, and of course, some hot pepper cutting best practices to prevent the burn.

*If you like The Weirdest Thing I Learned This Week, please subscribe, rate, and review us on iTunes (yes, even if you don’t listen to us on iTunes—it really helps other weirdos find the show). You can also join in the weirdness in our Facebook group and bedeck yourself in weirdo merchandise from our Threadless shop.

## Top 10 Applications Of Deep Learning In Cybersecurity In 2023

Deep learning tools have a major role to play in the field of cybersecurity in 2023.

Deep learning

which is also known as Deep Neural Network includes machine learning techniques that enable the network to learn from unsupervised data and solve complex problems. It can be extensively used for

cybersecurity

to protect companies from threats like

phishing

, spear-phishing, drive-by attack, a

password attack

, denial of service, etc. Learn about the top 10 applications of

deep learning

in cybersecurity.

Detecting Trace of Intrusion

Deep learning

, convolutional neural networks, and Recurrent Neural Networks (RNNs) can be applied to create smarter ID/IP systems by analyzing the traffic with better accuracy, reducing the number of false alerts, and helping security teams differentiate bad and good network activities. Notable solutions include Next-Generation Firewall (NGFW), Web Application Firewall (WAF), and User Entity and Behavior Analytics (UEBA).

Battle against Malware

Spam and Social Engineering Detection

Natural Language Processing (NLP), a deep learning technique, can help you to easily detect and deal with spam and other forms of social engineering. NLP learns normal forms of communication and language patterns and uses various statistical models to detect and block spam. You can read this post to learn how Google used TensorFlow to enhance the spam detection capabilities of Gmail.

Network Traffic AnalysisDeep learning

ANNs are showing promising results in analyzing HTTPS network traffic to look for malicious activities. This is very useful to deal with many cyber threats such as SQL injections and DOS attacks.

User Behavior Analytics

Tracking and analyzing user activities and behaviors is an important deep learning-based security practice for any organization. It is much more challenging than recognizing traditional malicious activities against the networks since it bypasses security measures and often doesn’t raise any flags and alerts. User and Entity Behavior Analytics (UEBA) is a great tool against such attacks. After a learning period, it can pick up normal employee behavioral patterns and recognize suspicious activities, such as accessing the system in unusual hours, that possibly indicate an insider attack and raise alerts.

Monitoring EmailsIt is vital to keep an eye on the official Email accounts of the employees to prevent any kind of cyberattacks. For instance, phishing attacks are commonly caused through emails to employees and asking them for sensitive data. Cybersecurity software along with deep learning can be used to avoid these kinds of attacks. Natural language processing can also be used to scan emails for any suspicious behavior.

Analyzing Mobile Endpoints

Deep learning

is already going mainstream on mobile devices and is also driving voice-based experiences through mobile assistants. So using deep learning, one can identify and analyze threats against mobile endpoints when the enterprise wants to prevent the growing number of malware on mobile devices.

Enhancing Human Analysis

Deep learning

in cybersecurity can help humans to detect malicious attacks, endpoint protection, analyze the network, and do vulnerability assessments. Through this, humans can decide on things better by bringing out ways and means to find the solutions to the problems.

Task Automation

The main benefit of deep learning is to automate repetitive tasks that can enable staff to focus on more important work. There are a few cybersecurity tasks that can be automated with the help of machine learning. By incorporating deep learning into the tasks, organizations can accomplish tasks faster and better.

WebShellWebShell is a piece of code that is maliciously loaded into a website to provide access to make modifications on the Webroot of the server. This allows attackers to gain access to the database. Deep learning can help in detecting the normal shopping cart behavior and the model can be trained to differentiate between normal and malicious behavior.

Network Risk ScoringUpdate the detailed information about **26 Things I Learned In The Deep Learning Summer School** on the Cancandonuts.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!