Derivatives in Data Science

Swapnil Bandgar
6 min readMay 27, 2021

--

Derivatives in calculus:

Derivative: — In mathematics, Derivative is the rate of change of a function with respect to a variable. Derivatives are fundamental to the solution of problems in calculus and differential equations. In general, scientists observe dynamical systems to obtain the rate of change of some variable of interest, incorporate this information into some differential equation, and use integration techniques to obtain a function that can be used to predict the behavior of the original system under diverse conditions.

The derivative is a way to show instantaneous rate of change which is the amount by which there is a change in the function at one given point. For functions that act on the real numbers, it is the slope of the tangent line at a point on a graph. . The derivative is often written as,

That is the difference in y divided by the difference in x. Here, d is not the variable so it cannot be cancelled out.

The derivative of y with respect to x is defined as the change in y over the change in x.

In mathematical terms,

That is, as the distance between the two x points becomes closer to zero, the slope of the line between them comes closer to respective tangent line.

Derivatives of functions: -

The derivative of different types of function can be derived as follows:

1) Linear functions: — The derivative of linear functions such as mx + c with no higher terms is constant. When the dependent variable y directly takes x’ value (y =x), the slope of the line is 1 in all places. So, regardless of where the position is.

2) Power functions: — Power functions behave according to their exponent and slope. Power functions follow the rule that :

3) Exponential functions: — An exponential is the form of abf(x), where a and b are constants and f(x) are a function of x. The difference between an exponential and polynomial is that in a polynomial x is raised to some power, whereas in an exponential x is in the power.

4) Logarithmic functions: — The derivative of logarithms is the reciprocal.

5) Trigonometric functions: — The cosine function is the derivative of the sine function, while the derivative of cosine is negative sine.

Properties of Derivatives: -

Derivatives can be broken down into smaller parts in order to manage them.

For example,

Derivative Types: -

The derivatives can be classified into different types based on their order such as first and second order derivatives.

First-Order Derivative: -

The first order derivative describes the direction of the function whether the function is increasing or decreasing. It can be predicted as an instantaneous rate of change. It can also be determined from the slope of the tangent line.

Second-Order Derivative: -

The second order derivative are used to get an idea of the shape of the graph for the given function. The functions can be classified in terms of concavity.

Derivative Examples: -

Example-1: Find the derivative of the function:

Solution: -

Now, calculate the derivative of f(x),

Now, split the terms of the function as:

Using the formulas,

Example- 2: Find the derivative of 2 tan x + 1

Solution: -

Importance of derivatives in data science:

Derivatives are used by Machine learning for solving optimization problems. Optimization algorithms such as gradient descent use derivatives to decide whether to increase or decrease the weights in order to increase or decrease any objective function.

Data Scientists use calculus for almost every model and a basic but very excellent example of calculus in Machine Learning is Gradient Descent.

Gradient Descent: -

A gradient measure how much the output of a function changes if you change the inputs a little bit. In machine learning model our goal is to reduce the cost in our input data. The cost function is used to monitor the error in predictions of an ML model. By minimizing this, basically means getting to the lowest error value possible or increasing the accuracy of the model. Therefore, we increase the accuracy by iterating over a training dataset while tweaking the parameters of our model.

For example, we have a dataset of users with their marks in some of the subjects and their selected further studies for graduation. Our aim is to predict the graduation stream of the person with considering the marks of the person.

In this dataset, we have data of Swapnil and Priyanker and using this data we need to predict the graduation stream of Sarang.

Now considering marks in the subject as a gradient and stream as the bottom target. We have to optimize the model so that the result it predicts at the bottom should be accurate.

By using Swapnil’s and Priyanker’s data we will create gradient descent and tune our model such that if we give the marks of Sarang then it should predict result of Science in the bottom of gradient and same for Priyanker. This is our trained model. Now if we give marks of subject to our model then we can easily predict the stream.

The basic formula that we can use in this model is

y = m*x + b

Where, y = predictor, m = slope, x = input, b = y- intercept

This type of problem can be solved by defining a cost function which determines how good a given line is. The cost function can be calculated using below equation:

Lines that fit our data better will result in lower error values. If we minimize this function, we will get the best line for our data.

We use differentiation to calculate slope.

To run gradient descent on this error function, we need to compute its gradient. To calculate gradient, we need to differentiate our error function. Since our function is defined by two parameters (m and b), we will need to compute a partial derivative for each.

These derivatives can be calculated as below:

We can initialize our search to start at any pair of m and b values (i.e., any line) and let the gradient descent algorithm march downhill on our error function towards the best line. Each iteration will update m and b to a line that yields slightly lower error than the previous iteration. The direction to move in for each iteration is calculated using the two partial derivatives from above.

The Learning Rate variable controls how large of a step we take downhill during each iteration. If we take too large of a step, we may step over the minimum. However, if we take small steps, it will require many iterations to arrive at the minimum.

Reference: Mathsisfun, Towardsdatascience.

--

--

Swapnil Bandgar
Swapnil Bandgar

Written by Swapnil Bandgar

Code is like humor. When you have to explain it, it’s bad. Connect with me on LinkedIn : https://www.linkedin.com/in/imswapnilb

Responses (1)