RELEASED 04-2024, LAST UPDATE 06-2024



Based on [3].


These players do not have their own money for buying assets, thus need to borrow. That is, they first borrow, then buy an asset, and by doing so, in a sense, they do not own the asset but some exposure.

  • Market makers
  • Traders
  • Brokers
  • Dealers

These actors, on the other hand, have their own money, that is,  do not need to borrow, to buy assets.

  • Investors



Major asset classes

  • Rates
  • Equities
  • Currencies
  • Commodities


Based on [3]


Instrument sectors

  • Fixed income, includes money market (loans, FRA, CD, CP, T-bills, …) and capital market instruments (bonds, notes, FRN, …)
  • Equities
  • Currencies
  • Commodities
  • Derivatives of major asset classes
  • Credit, includes high-yield bonds, corporate bonds, …



Based on [3].


Position type

  • Hedging
  • Arbitrage, from point of view of (a) market practice, or (b) theory of financial engineering
  • Speculation

 Position nature

  • Long
  • Short

By buying or short-selling assets, one takes positions, and once a position is taken, one has exposure to various risks.


When we buy an asset for cash and hold it in inventory, or when we sign a contract that obliges us to buy it at a future date, we have a long position. We are long the underlying instrument, and we benefit when the value of the underlying asset increases.


When a market practitioner has sold an asset without really owning it, a short position results. Being short an underlying, one loses when its value increases.


The short position can be on an instrument, such as selling a borrowed bond / stock / future commitment / swap / option. But it can also be on a risk or spread.


Of course, positions come in pairs: for every short position there is a long position with a counterpart.



Extract from [4, p113].


"The trading function within a financial institution is referred to as front office; the part of the financial institution that that is concerned with the overall level of risks being taken, capital adequacy, and regulatory compliance is referred to as middle office; the record keeping function is referred to as the back office."


"…there are two levels within a financial institution at which trading risks are managed. First, the front office hedges risks by ensuring that exposures to individual market variables are not too great. Second, the middle office aggregates the exposures of all traders to determine whether the total risk is acceptable."



Pricing options in the forex and equity markets often involves continuous compounding, making calculations more convenient.


However, quoted rates in the money and capital markets are annualized and simple rates (without compounding).


In a natural language, the meaning of a verb always comes with time and the grammatical person (hence declination of a verb).


The term interest, similarly, represents a structure, that is, it is a container for a few things that simultaneously must be considered beyond the value or magnitude of the interest rate.


For calculating accrued interest, the following information must be known, by means of conventions and / or explicit mentioning:

  1. period for which the rate is quoted,
  2. period for which interest is to be calculated (accrual period),
  3. whether interest is simple or to be compounded,
  4. and if interest is to be compounded, the compounding frequency (usually, same as payment frequency).

Questions to ask are which interest rate to apply and how many times to apply it.


We consider an applicable period for a rate, and this period we term as calculation period. This calculation period will be treated as the payment unit. And the overall number of payments will result in a chain of calculation units, the calculation schedule.


In reality, though, the calculation schedule and payment schedule, need not be the same (and often are not). Usually, for each calculation period there is a related payment at some point, not necessarily falling directly at the end of the calculation period. However, arrangements may call for payment every \(c\) calculation periods. Actually, there is pretty much no limit as to how calculation and payment schedules may relate to each other.


In this discussion, focus is on the following:

  • A calculation period is a simple interest period with a payment at the end.
  • Therefore, if considering compounded interest, we consider a chain of calculation periods, each with a payment at the end.
  • We assume the rate is quoted for a period ≤ the calculation period.
  • We consider day count conventions, such as 30/360, and we disregard rolling conventions, such as Modified Following.

For this, the rate's quotation has to be aligned with the calculation period. That is, the quoted rate \(r_{_Q}\) is converted to the calculation period under consideration of the day count convention. The day count convention renders a factor \(\frac{d}{y}\leq1\) to multiply the rate with. The compounding or payment frequency is \(n\).


For example, if a 3-month rate is quoted as 5% annual rate (it’s an annual rate available only for 3 months), then with a simple interest rate and for a 1-month period under 30/360 day count convention an initial unit sum increases by this factor.


For a 6-month period the factor is


And for a year, the factor is


For a year, not as simple interest though, but compounded, with monthly payment frequency, the factor is


This is the factor by which an amount will increase over a certain period. Here the period is one year, compounding frequency is 12, and annual interest rate is 5%.


How much is the increase under the same conditions, but with continuous instead of 12-times discrete compounding?


Continuous compounding involves \(e\), So let's appreciate different ways of arriving at \(e\), very well explained by the Mathematics Network, University of Toronto, [6].


An explanation of the meaning of \(e\) is followed by interpreting \(e\) as a limit, and interpreting \(e\) in calculus.


Meaning of \(e\)


Under 100% simple interest for a certain period the increase of an initial sum is by factor \(2\). Under continuous compounding (in the same period) the increase is by factor \(e=2.718...\). The same expressed differently: if under simple interest (during a period), the interest is \(1\)-times the initial amount, then with continuously compounded interest (for the same period length) the the interest is \(e^1\)-times the inititial amount.


"The number \(e\) is the factor by which a bank account earning continually compounding interest (or a reproducing population whose offspring are themselves capable of reproduction, or any quantity that grows at a rate proportional to its current value) will increase, if without the compounding (or without the offspring being capable of further reproduction) it would have doubled (increased by 100%)."


For 200% simple interest, one considers 2 consecutive half-periods of each 100% and arrives at \(e^2\).


The general situation is that if the simple interest is \(R\)-times the original amount, then under continuous compounding the factor is \(e^R\).


\(e\) as a limit


We consider a period with 100% simple interest. The interest relates to the original amount with factor 1. Under continuous compounding (over the same period) the factor is \(e^1\). What happens between these two extremes?


We consider \(n\) equal and consecutive periods, and for each period the simple interest is \(\frac{1}{n}\), resulting in the below factor when compounding interest over the \(n\) discrete periods.


\(e\) is the limit of this factor as \(n\) goes to infinity:


\(e\) in calculus


Measure of rate of change of a function (with respect to changes in its input) is given by its derivative.


We consider differentiation of an exponential function \(f(x)=a^x\). It turns out that \(e\) is that value of \(a\) for which the constant \(K\) in the equation below is \(1\), so that the derivative of the exponential function \(f(x)=a^x\) is equal to itself. (As shown below, the constant \(K\) is based on the base \(a\), and is independent of \(x\).)

\begin{array}{rcl} f´(x)&=&{\lim_{h\to0} \frac{f(x+h)-f(x)}{h}}\\ &=&\lim_{h\to0} \frac{a^{(x+h)}-a^x}{h}\\ &=&\lim_{h\to0} \frac{a^xa^h-a^x}{h}\\ &=&a^x\lim_{h\to0} \frac{a^h-1}{h}\\ &=&a^x K \end{array}

\(Ka^x\), the value of the derivative, is the rate of change of the exponential function \(f(x)=a^x\). That is, the rate of change of the simple exponential function \(f(x)=a^x\) is \(K\)-times its function value for all \(x\), and as mentioned above, \(a=e\) is that special value of base \(a\) that makes this rate of change \(1\)-times its function value for all \(x\), that is, making the derivative of the simple exponential function \(f(x)=a^x\) equal to itself.


The case for the simple \(e\)-based exponential function \(f(x)=e^x\), rate of change is one-times the current function value for all \(x\):


The general case for an \(e\)-based exponential function \(f(x)=B\,e^{Rx}\), rate of change is \(R\)-times the current function value for all \(x\):


(We have two independent ways to affect the rate of change of an exponential function: change of base \(a\) (impacting \(K\)), and change of dependent variable \(x\) (via \(R\)) is a real-valued factor of the function itself.)


Considering the continuously compounding interest function over time \(f(t)=e^{rt}\) with annual rate of interest of 5% and a period of 1 year (which is the same unit as that of the rate's quotation, that is \(t=1\)), the final amount, including interest, is:


Below is a comparison of final amount, with annual interest rate 5% and 1-year period.

No compounding: \(1.05\)
12-times compounding: \(1.051161898\)
Continuous compounding: \(1.051271096\)



Based on [4].


The idea is that an interest rate in one compounding frequency is known and the equivalent interest rate in another compounding frequency, rendering the same investment amount, is to be calculated.


Conversion between continuous and discrete


A continuous interest rate of \(r_c\) is known, and the equivalent m-times compounded interest rate \(r_m\) is to be calculated over \(n\) periods (it turns out that the number of periods, \(n\), is irrelevant in this calculation).

\[\Rightarrow r_c=m\ln(1+{\scriptsize\frac{r_m}{m}})\]
\[\Rightarrow r_m=m\,(e^{\frac{r_c}{m}}-1)\]

Conversion between discrete

\[\Rightarrow r_p=p\left(\sqrt[^p]{(1+{\scriptsize\frac{r_m}{m}})^m}-1\right)\]



Rate with continuous compounding is \(12%\), and the semi-annually compounded equivalent is \(12.3673%\).




Based on [2] and [3].





Based on [2].


A \(T\)-period forward rate is a simple interest applying to a period of \(T\) that starts in the future. Here, future means a date after the spot date.


A forward rate agreement (FRA) is an agreement to pay a fixed simple interest rate on a certain amount between two dates, both in the future. This fixed simple interest rate between \(T_1\) and \(T_2\), \(F_{12}\) in the equations below, is the rate to determine by valuing the FRA.


A no-arbitrage argument is used, stating that it should not matter whether two consecutive investments are made from \(T_0\) to \(T_1\) followed by \(T_1\) to \(T_2\) or only one from \(T_0\) to \(T_2\).


The three terms from left to right are capitalization factor from now to \(T_1\), interest earned between \(T_1\) and \(T_2\), and capitalization factor from now to \(T_2\).


Solving for \(F_{12}\)




Pricing and valuation (also termed revaluation) are the same calculations, both aim at finding the value of a financial contract.


Pricing is the basis of a decision about whether to trade the contract or not.


After the contract has been traded, given changing market conditions, it is important to know how the trade values in the changing market, that is, for changing market parameters that affect its price.



Based on [2] and [3].


These are derivatives.


Linear derivatives have a linear payoff function. Non-linear derivatives have a non-linear payoff function. The payoff function is on the underlying price.


Examples of linear derivatives are futures, forwards and swaps. Options, on the other hand, are non-linear derivatives.


A linear derivative can be hedged once, and the hedge will work until the derivative's maturity.


When hedging a non-linear derivative, the hedge should be adjusted frequently to be effective.



For a trader, it is important to know how a trade values under changing market conditions. That is, market parameters or conditions, such as interest rates, foreign exchange rates (which depend on interest rates of two currencies), prices, etc. change, and if the trader’s portfolio is dependent on any of these, it is critical to know the extent of each of these dependencies.


The Greeks are calculations that serve as a tool for traders to manage their risks in derivatives trades. Essentially, these calculations aim at isolating and quantifying risks associated with various market factors, such as underlying price, interest rates, volatility of the underlying price, and time to maturity. As with any calculation, certain assumptions may apply.




Delta (capital letter Δ, lower case δ) is the term used to quantify the change in value of a trading position given a certain change in an underlying's price, with other dependent parameters being constant.


One way to measure this is to change only the mentioned price and revalue the portfolio. Then the difference in portfolio value divided by the price change is the change in value of the portfolio per unit price change, and this is the delta of the portfolio with respect to that market parameter.


For example, if the value of a portfolio decreases by 100 USD compared to a previous valuation, when revalued with the same market data as before, but only the price increased by 0.1 USD, then the delta of the portfolio is −100+0.1=−1000, meaning that for every dollar increase, the value of the portfolio decreases by 1000 USD.


The delta of an isolated trade on a linear instrument is constant. That of a nonlinear instrument is not constant.


Mathematically, delta is the first partial derivative of a portfolio with respect to an underlying's price.




The second partial derivative of a portfolio with respect to an underlying price, that is, the rate of change of delta, is the gamma (Capital letter Γ) of the portfolio.


Given constant delta for trades on linear products, their gamma is zero. (Delta is constant, because linear products have a linear payoff function, which is a function of the underlying price.)




(Vega is not a letter in the Greek alphabet.)


The same principle used for calculating delta, applied with a certain change in volatility of a market variable, notably the underlying price, returns the value change of the portfolio with respect to a unit change in the volatility. Volatility is expressed in percent.


Spots, forwards, and swaps have no dependency on the volatility of the underlying market variable, but options and more complicated derivatives do.




The same principle applies, as with delta. The change in value of a portfolio per unit change in interest rates is the rho (lower case letter ρ) of the portfolio.




Capital letter Θ)




Based on [2, chapter 4].


Volatility matters for options or optionality in linear instruments.


Considering the Black-Scholes model, all the inputs except volatility are either observable in the markets or given by the contract that is being priced. An observable market price for the option is also available. Volatility of returns for historic prices is observable and can easily be calculated (historic volatility), but not for the present or future. For a view of present or future volatility, all the known parameters, plus the market price, can be used in an equation, the Black-Scholes formula, and the equation solved for volatility, which gives an implied volatility.


"More fundamentally, it is important to realize that no model is perfect and it will never describe the market perfectly. Where are the imperfections in our model? Given certain parameters, our model produces a price. These parameters are strike, time to maturity, interest rates, current stock price and volatility. One of these parameters is vastly different from the others: all except volatility are either specified by the option contract or are observable in the market. A consequence of this is that if you call a trader and ask for a price on an option, he will not quote you a sum of money, instead he will quote a volatility, or vol, as traders typically say. Actually he will quote two vols, the price to buy and the price to sell. The vol used to price is often called the implied volatility as it is the volatility implied by the price. A market maker is expected to quote two prices or two vols, which are close together so the purchaser can be sure that he is not being cheated. […] Note that the difference between the two quoted prices is where the bank makes its profits. The true price is somewhere in between, so whichever the customer chooses, the bank makes a small profit."


"How does the trader estimate volatility? We can measure, and therefore use, the volatility of an asset over any period in the past for which we have market data. The problem is which past period? We could, for example, use a thirty-day average. Although the Black-Scholes model is based on an assumption of constant volatility it is really the average volatility that is important, or more precisely the root-mean-square volatility. A major news event will cause rapid movements in an asset's price. This is undesirable because we would end up quoting a much lower vol thirty-one days after a major news event occurred in the last thirty days than after one, despite the fact that very little has changed. One solution to this problem is to use a weighted average with the weight ascribed to a given day decaying as it gets further in the past.


A more subtle issue is that it is not the past volatility that matters. It is the volatility that occurs during the life of the option which will cause hedging costs and the option should be priced thereby. The trader therefore has to estimate the future volatility. This could be based on market prices, past performance and anticipation of future news. It is important to realize that announcements are often expected in advance, and the information they contain will either push the asset up or down. The market knows that the asset price will move but cannot discount the information as it does not know whether it is good or bad. The options trader on the other hand does not care whether information is good or bad, all he cares about is whether the asset price will move. Thus the anticipation of an announcement will drive estimated vols up.


The trading of options is therefore really about the trading of vol. and the options trader is taking views on the future behaviour of volatility rather than the movement of the underlying asset."


And this root-mean-square is the standard deviation of returns that considers up swings (positive percentage changes) and down swings (negative percentage changes).


[7] on root-mean-square: "It is useful when trying to measure the average 'size' of numbers, where their sign is unimportant, as the squaring makes all of the numbers non-negative."


[3]: The variance of the Wiener Process (used to model stock price movements) is proportional to the elapsed time, and therefore the standard deviation of it, that is the volatility, is proportional to the square root of time.




Historical volatility is backward-looking, and implied volatility is forward-looking. As they are looking in opposite directions, it should not be expected of them to be the same.


Volatility refers directly to the variability of returns, but it is possible to use it to make inferences about the distribution of prices. If annual returns are normally distributed with mean μ and standard deviation (volatility) σ, the following statements can be made:

1. The mean price at time t, based on price \(S_0\) at outset, is
2. The median price at time t is
\(S_{_0}e^{\mu t}\)
3. There is a 68.3% chance that prices will fall in the range
\(S_{_0}e^{\mu t-\sigma\sqrt{t}}\)
\(S_{_0}e^{\mu t+\sigma\sqrt{t}}\)
4. There is a 95.4% chance that prices will fall in the range
\(S_{_0}e^{\mu t-2\sigma\sqrt{t}}\)
\(S_{_0}e^{\mu t+2\sigma\sqrt{t}}\)



If a call option with price \(C\), a put option with price \(P\), and a forward contract with price \(F\), have the same strike and expiry, then this equation holds.




The value of the put is never greater than \(K\). So, it is bound by \(0\) and \(K\).


But the value of the call can be arbitrarily high, depending on how the underlying price moves. And this makes some mathematical convergence arguments involving calls tricky.


By pricing the put, knowing the price of a forward contract, and using put-call parity, the value of the call is simply calculated.



It is a theory that establishes a relationship between the spot and forward exchange rates of two countries (currencies) and their respective interest rates (from spot to start of the forward period).


It does so by using no-arbitrage arguments. The idea is that it should not make a difference, that is, giving opportunity for arbitrage, to go about investing in local currency (and earning interest in local currency) versus investing in a foreign currency (exchanging local for foreign currency, and earning interest in the foreign currency) and converting the nominal and earnings back to the local currency at maturity of the investment.



As the term implies, there is an engineering aspect to it.


A simpler view on financial engineering is that its purpose is to decompose a financial instrument into other instruments and price it off these. Any other price would lead to arbitrage. It is also termed replication. As a result, building block instruments are necessary.



A corporate bond can be viewed as a riskless bond plus a risky asset paying on top regular coupons, and in case of default of the issuer, a sum. And this risky asset is a credit default swap, which is also traded in its own right. It can be synthesized by going long a corporate bond and short a riskless bond.




Details follow, based on [1].


A more broad view is that given the building block instruments, that serve as tools, they can be tweaked and adjusted, and combined to form an infinite number of desired outcomes, characteristics or features in application areas of financial engineering.


Applications of financial engineering

  • Hedging
  • Speculation
  • Arbitrage

Sources of financial risk

  • Currency risk
  • Interest-rate risk
  • Equity risk
  • Commodity risk
  • Credit and counterparty risk
  • Liquidity risk
  • Operating risk
  • Other market risk (such as volatility risk, and basis risk)
  • Model risk

Accounting and economic risk


The two should be differentiated. Quantifying accounting risk is straight forward as it has  a clear and comparatively narrow implication. It is backward looking. In contrast, economic risk has wider direct and indirect implications, making it more challenging to quantify.



For a production plant, the accounting risk from a rise in interest rates is in liabilities incurring more costs, and more interest-rate income form assets. Economic risk: "Suppliers faced with higher interest costs will demand payment sooner, while customers may take extended credit and pay later. This will worsen the company's cash flow, leading to more borrowing, and yet higher interest costs. Worse, higher interest rates may slow down the economic activity, resulting in less demand for the company's goods. In addition, the higher rates may encourage overseas investors, leading to a temporary increase in the domestic currency, while at the same time increasing the price of exported goods in foreign currency terms. The net effect: a worsening of the company's competitive position, and a further reduction in the company's fortunes."





A good example is deriving the interest rate parity equation.


Arbitrage-free pricing highlights two important concepts: replication and time-value of money.


"We have seen that arbitrage can price simple contracts precisely in a way that allows for no doubt in the price, and the price is independent of our view on how asset prices will evolve. The great revolution in modern finance is based on the observation that such arguments can be extended to cover pricing of vanilla options."


Examples for the 'simple contracts' mentioned above are 'FX forward' (determining the forward exchange rate that leads to the 'interest rate parity' equation) and 'forward rate agreement'.



Based on [1] and [2].


Risk-neutral means that investors do not require a risk premium. And because of this not requiring a premium, that is, not requiring a discount on the future price to account for risk, the price is the expected return.


Why adopt such an approach in the first place? Under the assumptions of the pricing approach, it is justified when eliminating risk by having hedged it away. Because in such a situation and constellation the portfolio has no risk and, therefore, should grow at the same rate as a riskless bond.


Of course, it is not the case that an investor is risk-neutral. In fact, investors are risk-averse to varying degrees. But the risk-neutral approach is a (harmless) trick to make the probabilistic approach give the correct answer.


The pricing approach mentioned above is the possibility of constructing a riskless portfolio by combining an option with some proportion of the underlying asset. (This is the foundation of the binomial method for option valuation.) The argument is that because this portfolio is riskless, that is, has the same financial outcome regardless of market events, (a) future cashflows should be discounted at the riskless interest rate, and (b) the portfolio is worth the same regardless of it being valued by risk-averse or risk-neutral investors. Therefore, we choose the riskless interest rate.



Based on [9] and [3].


The Wiener process is a real-valued, continuous-time stochastic process, and one of the most well-known examples of Lévy processes (Paul Pierre Lévy, 1886-1971, mathematician).


A Lévy process is a stochastic process with stationary, independent increments.


Stationary increments mean that the magnitude of change depends only on the timespan of observation and not the time when the observation was started. A Lévy process has, by definition, stationary increments and can be viewed as the continuous-time analog of a random walk, which is a discrete-time random process and has, by construction, stationary increments.





Other well-known or important examples are the Poisson, Gamma, Pascal, and Meixner processes. Among the non-deterministic Lévy processes only the Wiener process has a drift and a continuous path.




1827, Robert Brown (1773-1858, botanist) described a physical process, the random motion of particles suspended in a medium (liquid or gas), on the example of pollen in water that he observed with a microscope.


1900, in his doctoral thesis, Louis Bachelier (1870-1955, mathematician) modelled this physical process of such a random motion, though in one dimension only, as a stochastic process. He modelled a one-dimensional Brownian motion for studying price changes on the Paris Bourse and used it for valuing stock options. Bachelier is considered the forefather of mathematical finance and pioneer in the study of stochastic processes in finance.


1905, Albert Einstein (1879-1955, theoretical physicist) modelled the random motion of the pollen particles as being moved by water (liquid medium) molecules. Individual molecules exert forces (presumably not of varying magnitude) from constantly changing directions, resulting the particle being hit more on one side at different times, in turn resulting in the seemingly random nature of the motion of the particle in the medium. This explanation of Brownian motion was compelling evidence that atoms and molecules exist.


1908, Jean Perrin (1870-1942, physicist) experimentally verified Einstein's explanation of Brownian motion, confirming the atomic nature of matter. He received the Nobel Prize in physics in 1926 for this.


Norbert Wiener (1894-1964, mathematician, computer scientist) had strong interest in the mathematical theory of Brownian motion and proved, among others, the non-differentiability of the paths. The simple mathematical representation of one-dimensional Brownian motion, named after Norbert Wiener in honor of his contributions, assumes the current velocity of a fluid particle fluctuates randomly. This is what the Black Scholes model is based on.



Hedging an option with the underlying by buying or selling a certain amount of the underlying. The certain amount is the first derivative of the function of the price of the underlying, that is, it is the rate of change of the underlying.


To have an effective hedge, the buying or selling of the underlying should not be too infrequent. This rate of change of the price of the underlying is not constant. The more volatile the underlying price (meaning that the changes in the underlying price are more frequent and larger compared to infrequent price changes), the more frequent the buying and selling should occur to meet hedging requirements.



To say 'we assume that the log of the price of the underlying is normally distributed and hence the underlying price is lognormally distributed' as the basis of discussion, etc., is correct, but misses the point.


The lognormality of prices is based on the definition of returns, as explained in [1, p255]. There are various definitions for return. Going by the definition of returns being the natural log of price relatives (logarithmic return, continuously compounded return) and assuming a normal distribution of this return, we end up with the lognormal distribution of the price itself.


Traditional return


Logarithmic (continuously compounded) return


Why log of price relatives? With this definition, \(x\%\) up followed by \(x\%\) down (of the return, not the price), leads back to the exact same level of return.


A step back. Given the frequent occurrence of the normal distribution, considering that prices follow a normal distribution is problematic, as a negative price does not occur, yet is among possible values of a normally distributed random variable. But the assumption of normally distributed returns is not problematic, as negative returns do occur.


(Contrary to prices, interest rates can be negative – a monetary policy, applied in an inflationary environment with strong signs of deflation – and this has been experienced in the markets.)


So, after establishing that not prices but returns follow a normal distribution, it remains to consider how percentage rises and falls are to be applied on the example of the above '10% up followed by 10% down'.


We will take a tour of (a) how not to add something and (b) instead multiply something else, and then (c) use a mathematical tool to add the something else.


(a) Adding price percentage changes may be tempting, as saying \(10\%\) up followed by \(10\%\) down is \(10\%-10\%\), getting back to where we started. But this is not the case, as in the comparison below.


Starting at price \(100\), \(10\%\) up results in \(110\), and subsequent \(10\%\) down results in \(99\), which is not the starting point.


(b) To get back to the starting point after \(10\%\) up and \(10\%\) down, the correct approach is to multiply with price relatives (instead of adding price percentage changes).


Price relative after \(10\%\) up of the price is \(\frac{110}{100}\), and price relative after \(10\%\) down of the price is \(\frac{99}{110}\). Muliplying these price relatives results in \(99\).


(c) By applying the logarithm to the calculations, we convert the multiplication to addition.

\begin{array}{rl} \ln(\frac{110}{100} \frac{99}{110}) & =\ln(\frac{110}{100})+\ln(\frac{99}{110})\\ & = 0.09531018+(-0.105360516)=-0.010050336\\ & = 9.531018\%-10.5360516\%=-1.0050336\%\\ \end{array}

With this definition of return, the logarithmic return, we are in the continuously compounded domain, and this is the reason why the percentage return of price change from \(100\) to \(110\) is not \(10\%\) but \(9.53\%\).


Similarly, the return of price change from \(110\) to \(99\) is not \(-9\%\), but \(-10.54\%\), which correctly shows that the price move from \(110\) to \(99\) is a larger percentage than the move from \(100\) to \(110\).


Also, it leads to the correct return associated with price move from \(100\) to \(110\) to \(99\) being a loss of \(1.01\%\) (\(1\%\) with simple compounding).




Considering the sequential prices observed for two consecutive price-moves being \(100\), \(110\), and \(100\), the respective price relatives are \(\frac{110}{100}\) and \(\frac{100}{110}\), and the sum of the (logarithmic) returns associated with these price relatives is zero.

\[\ln(\frac{110}{100} \frac{100}{110})=\ln(\frac{110}{100})+\ln(\frac{100}{110})=0.09531018+(-0.09531018)=0\]

The equal price-move up and down is expressed by the logarithmic return with the same percentage values of \(9.531018\). With traditional return it is not reflected incorrectly, but differently and with different percentage values, because the basis for measuring the percentage changes (from 100 to 110).



Deriving Black-Scholes

  1. Approximate Brownian motion by a discrete process (… tree) in which the value at each node of the tree is determined by no-arbitrage arguments.

  2. Derive a partial differential equation for the price of an option.

  3. A more fundamental approach, relying on the concept of a Martingale measure.

[2]: "The Black-Scholes no-arbitrage argument is an example of dynamic replication; by continuously trading in the underlying and riskless bonds, we can precisely replicate a vanilla options payoff. The cost of setting up such a replicating portfolio is then the unique arbitrage-free price of the option."




The price of a financial asset is its expected value. That is, we need to know possible outcomes and probability of each outcome. One can consider discrete or continuous probabilities.


This applies to options as well, and because options can take on almost any value, a continuous probability distribution must be used.


With discrete probability distributions, the probability of a particular outcome can be measured directly. With continuous probability distributions, however, the probability of a particular range of outcomes is measured by taking the area beneath that section of the curve.




Assuming the option could take on only two values \(v_1\) and \(v_2\), with respective probabilities \(p\) and \(1-p\), the price of the option would be \(p\,v_1+(1-p)\,v_2\).




Despite the option being able to take on almost any value, we are interested in the expected value and the probability associated with it, all under the condition that the option ends up in-the-money.




On the example of a call option, the pricing of an option can be reduced to determining two values, and the solution to both problems can be found in the lognormal distribution of financial prices.

  1. The probability p that the option ends in-the-money, that is, so that the underlying price is greater than the option's strike. EASY PART

  2. The expected value of the underlying given that the option ends in the money. DIFFICULT PART

Remember, we have continuous probability, discount with the riskless interest rate, expected value of underlying asset, strike, time horizon.




We convert the problem of determining the probability that at maturity \(T\), \(S_T\) will exceed some critical value \(X\) (based on lognormal distribution of prices) to the probability that the return will exceed some critical value \(r_x\) (based on normal distribution of returns), because it is easier to deal with normal distributions.


So, the problem is now finding probability \(p\) as per below, with \(S_0\) being the underlying price at the outset.

\begin{array}{ccccc} \ &\ &\tiny{\textsf{Lognormal distribution}}& &\tiny{\textsf{Normal distribution}}&\ \\ p&=&Prob[S_T \gt X]&=&Prob\left[return \gt \ln\left(\frac{X}{S_0}\right)\right] \end{array}

Based on [3].


Another approach to Black-Scholes is getting to the PDE via convexity of option payoffs and then with assumptions concerning the dynamics of the underlying price, arriving at the closed-form solution, the Black-Scholes formula.


The point on the convexity of option payoffs is that it implies the arbitrage argument that the expected net gains (losses) from oscillations of the underlying price are equal to time decay during the same period. And this leads to the Black-Scholes PDE.


Black-Scholes PDE


Based on [3].


Considering the portfolio comprised of an option (priced \(C\)), hedged with some amount (the hedge ratio, \(C_s\)) of the underlying, the trading gains and costs can be put together to come up with an important PDE, which plays a central role in financial engineering, the Black-Scholes PDE.

\begin{array}{ccccc} \tiny{\textsf{option gain from hedging adjustments}} & \tiny{\textsf{interest earned on the short position}} & \tiny{\textsf{funding cost}} & \tiny{\textsf{time-decay}} &\\ &&&&\\ \frac{1}{2}\,C_{ss}\,\sigma^2\,S_t^2 & + r\,C_s\,S_t & - r\,C & +C_t & = & 0\\ &&&&\\ {\textsf{A}} & {\textsf{B}} & {\textsf{C}} & {\textsf{D}} & \end{array}











An overview of what is ahead. The option value is a convex curve, that is the first derivative is positive. The hedge is a line. The first derivative of the curve is a line, and it can be almost in line with the curve of the hedge. That is, by hedging with the right amount, the negative changes in option value can be offset in order to have a neutral portfolio of option and hedge. Almost, as mentioned above. Bringing in the second derivative of the option value curve makes it sufficiently accurate (Taylor approximation). Besides hedging a loss in value of the option due to price change in the underlying via the right amount of the hedge (hedge ratio), the good news is that the second derivative of the option value curve is always positive, meaning the position will always gain a little cash, regardless of positive or negative changes in the underlying's price. The bad news is that the first derivative is not constant, meaning that the hedge actually would need real-time re-balancing for the concept to work. Again a good news at level of (a larger) portfolio is, that collective periodic (and not real-time / continual) hedging at portfolio level, is a realistic benefit.


We consider a market maker with 1 unit of a long call option \(C(t)\). The position must have been funded, and assumption is that it is funded via borrowing \(C(t)\) at the risk-free rate. (The other funding mechanism would be to sell assets.) The position must also be hedged against a falling underlying price, that is, by shorting the underlying. By doing so the drop in option value is almost offset by a raise in the value of the shorted underlying position. Almost, because the option value is described by a convex curve, but the underlying is a line, and the two do not respond identically to a change in the underlying price.


How much of the underlying should be shorted to hedge the 1 unit of the call option \(C(t)\)?


Well, purpose of hedge is to capture (adverse) change in the value of the option position resulting from (small) changes in the underlying's price. Therefore, having the rate of change of the option position, it is possible to (almost, see above point on curve vs line) quantify the change in value of the option position. Rate of change \(r\) means for 1 unit change of underlying, the dependent variable changes by \(r\), and this the definition of the first derivative of the curve, and it is a linear relationship due to considering a small change and the tangent at the corresponding point on the curve. So, for 1 unit change of underlying, the dependent value changes by \(r\), and for \(x\) change in the underlying it is \(rx\). The change is quantified, which will be the hedge short position.


For a change in underlying of \(\Delta S_t\) (and considering everything else being unchanged) the option value changes almost \(C_s\) times as much as the change in the underlying price, \(\Delta S_t\), with \(C_s\) being the first partial derivative with respect to the underlying price. That is the change in option value is \(\ \approx C_s\,\Delta S_t\).


Test: Short 1 unit of the underlying. Therefore, the short underlying position changes by \(-\Delta S_t\). So, the portfolio of long 1 unit option and short 1 unit underlying changes by the below amount.

\(C_s\,\Delta S_t - \Delta S-t \approx (C_s - 1)\,\Delta S_t\)

The above change is negative, because \(\ 0 \lt C_s \lt 1\ \). So the risk is not totally eliminated.


According to the above equation shorting \(C_s\) units of the underlying will remove the risk as result of \(\Delta S_t\) change in underlying price.


To test this, one can consider the new portfolio consisting of long 1 unit of option, short \(C_s\) units of the underlying, and a borrowing of funds to buy the 1 unit of option.


With a \(\Delta S_t\) change in underlying price, the value of the option changes by \(C(S_t+\Delta S_t)-C(S_t)\), and the value of the short underlying position changes by \(-C_s\,\Delta S_t\). So, the change in value of the portfolio is given below.

\begin{array}{rcl} \textsf{change in portfolio value} & = & \left[C(S_t+\Delta S_t)-C(S_t)\right]-C_s\,\Delta S_t\\ & &\\ \tiny{\textsf{with Taylor series approximation of }C(t)\text{ around }S_t\text{ for change }\Delta S_t} & &\\ \tiny{C(S_t+\Delta S_t)\ =\ C(S_t)\ +\ C_s(S_t)\,S\,\Delta_t\ +\ \frac{1}{2} C_{ss}(S_t)\,S\,{\Delta_t}^2\ + ⋯} & &\\ & &\\ & = & \left[C(S_t)+C_s(S_t)\,S\,\Delta_t+\frac{1}{2} C_{ss}(S_t)\,S\,{\Delta_t}^2-C(S_t)\right]-C_s\,\Delta S_t\\ & = & \frac{1}{2} C_{ss}(S_t)\,S\,{\Delta_t}^2 \end{array}

The second partial derivative is always positive, and therefore the portfolio's value will always be positively affected by small changes in \(S_t\).


This is delta hedging, and the delta being the first partial derivative with respect to the underlying price is the hedge ratio. This hedge ratio is not constant. It depends, among other, on time \(t\) and the underlying process \(S_t\), and as mentioned above, based on the convexity of the option value curve \(0 \lt C_s \lt 1\).

\[\textsf{hedge ratio} = h_t = \frac{\partial C(S_t,t)}{\partial S_t} = C_s\]

Adjusting the hedge over time requires selling and buying back certain amounts of the underlying in reaction to \(S_t\) moves.


Considering oscillations around an initial point \(S_(t_0)=S^0\) and successive time periods \(t_0,t_1,…,t_n\) that are \(t_i-t_{i-1}=\Delta\) units of time apart, we assume that \(S_t\) oscillates at an annual percentage rate of one standard deviation, \(\sigma\), around the initial point \(S^0\).


Therefore, the percentage change of the oscillation is proportional to \(\sqrt \Delta\).

\[\Delta S=\sigma \,S^0 \sqrt \Delta\]


The three possible values of the underlying price:

\begin{array}{l} S^- = S^0 - \Delta S\\ S^0\\ S^+ = S^0 + \Delta S\\ \end{array}

Correspondingly, the hedge ratios are \({C_s}^-\), \({C_s}^0\), and \({C_s}^+\), with \({C_s}^- \lt {C_s}^0 \lt {C_s}^+\), for all \(t\).


This means that as \(S_t\) moves, the hedge ratio \(h_t\) changes in a particular way, and to keep the portfolio delta-hedged, the market maker needs to adjust the number of underyling \(S_t\) that was shorted per time period \(t_i\).


When \(S_t\) moves from \(S^0\) to \(S^-\), the short underlying position gains in value, and therefore the number of underlying constituting the hedge can be reduced, and this is reflected in \({C_s}^- \lt {C_s}^0\). This means that the market maker buys back some underlying to reduce the short position in the underlying, and the good thing is that the market maker buys when prices are low.


When \(S_t\) moves from \(S^0\) to \(S^+\), the short underlying position loses in value, and therefore the number of underlying constituting the hedge must be increased, and this is reflected in \({C_s}^0 \lt {C_s}^+\). This means that the market maker shorts some underlying to increase the short position in the underlying, and the good thing is that the market maker sells when prices are high.


These analogously apply to \(S_t\) moves from \(S^-\) and \(S^+\) to \(S^0\).


These gamma gains are equal to the time-value of the option.


Black-Scholes closed-form solution


Based on [13].

\[C(S,t)=S {\color{#BA30A6}\,N(}{\color{#28883A}d_1}{\color{#BA30A6})}-K\,e^{-rT} {\color{#BA30A6}N(}{\color{#28883A}d_2}{\color{#BA30A6})}\]

\({\color{#BA30A6}N(}x{\color{#BA30A6})}\) is the cumulative distribution function (CDF) of the standard normal distribution

\[{\color{#BA30A6}N(}x{\color{#BA30A6})}=\frac{1}{2\pi}\int_{-\infty}^{x} e^{-\frac{t^2}{2}} \,dt\]



Based on [9] and [12].


Considering a continuous, differentiable function \(f(x)\) that has one or more roots, and a best guess that a root is around the point \(x=x_0\), then the Newton-Raphson method (also called Newton method) gives the below approximation for that root.

\[x_1=x_0 - \frac{f(x_0)}{f{^1}(x_0)}\]

And this can be repeated, until the desired accuracy is reached.

\[x_2=x_1 - \frac{f(x_1)}{f{^1}(x_1)}\]
\[x_{n+1}=x_n - \frac{f(x_n)}{f{^1}(x_n)}\]

If the function has more than one root, then \(x_0\) should be closer to the target root than the others.


The idea of Newton-Raphson for estimating the root around \(x_0\) is to use the tangent going through point \((x_0, f(x_0))\) and calculate its root, which gives a point closer to the target root of the function \(f(x)\). This is detailed in the geometric interpretation below.


With the idea behind Newton-Raphson in mind, there are limitations where it cannot perform well. For example, when around \(x_0\) there are points of inflection, local maxima, or minima. Reason for this is that then the tangent may lead to a point (on the x-axis) farther away from the root than \(x_0\). It could help to choose (if possible) \(x_0\) closer to the root.



The geometric interpretation is easily done in relation to the general equation of a line. The expressions below expresses the same thing in different ways.

Starting off with this line passing through \((0,0)\) \(y=x\)
Change the slope, so that for every \(1\) unit along the x-axis, it goes \(a\) units along the y-axis \(y=a\,x\)
Shift \(b\) units along the x-axis \(y=a\,(x-b)\)
Shift \(c\) units along the y-axis \(y=a\,(x-b)+c\)
We are back at the slope \(a\) for a line going through \((b,c)\) \(a=\frac{y-c}{x-b}\)
We have function \(f(x)\)
Draw tangent going through point \((x_n,f(x_n))\) \(y=[f^{(1)}(x_n)]\,(x-x_n )+f(x_n)\)
The root, that is point \((x_{n+1},0)\), the \(x_{n+1}\) for which \(y=0\) \(x_{n+1}=x_n-\frac{f(x_n)}{f^{(1)} (x_n)}\)



Based on [4], [9], and [12].


The general situation:


The Taylor series or Taylor expansion of a function at / around point \(b\) is an infinite sum of terms. The first term is the function value at point \(b\), and each \((n+1)^{th}\) term is expressed in terms of the \(n^{th}\) derivative of the function at \(b\), and the \((n+1)^{th}\) term is given below.


Obviously, the function must be sufficiently differentiable. That is, a linear function can be represented by no more than two terms, and a quadratic function with no more than 3. For most functions, the function at point \(b\) and the sum of the terms of its Taylor series are equal near \(b\).


Often the higher-order derivatives of the function beyond the 2nd or 3rd degree derivative are not considered, because they contribute significantly less to the function value and can be ignored in context of a neglectable margin of error.


For approximation of a function, the approach of representing a function in terms of its Taylor expansion is a practical tool.


It enables to approximate even a non-polynomial function at / around a point by an \((n+1)^{th}\) degree polynomial, simply by forming a partial sum from the series, considering the first \(n+1\) terms of the series. This \((n+1)^{th}\) degree polynomial is the \((n+1)^{th}\) degree Taylor polynomial for the function at / around the point of concern.

\begin{array}{rcl} f(x)&=&f(b)+\sum_{i=1}^{n}\frac{1}{i!}\,\frac{d^if}{dxi}^(x-b)^i\\ &=&\sum_{n=0}^{\infty}\frac{f^{(n)}(b)}{n!}\,(x-b)^n \end{array}

If the change in the function value at / around \(b\) is of interest, which is the case in risk calculations, then it is formulated by rearranging the above equation.

\[\Delta f = f(x)-f(b) = \sum_{n=1}^{\infty}\frac{f^{(n)}(b)}{n!}\,(x-b)^n\]

When calculating, inputs are the point at which the Taylor expansion is, here \(b\), and the interval of concern around this point, here \(x-b\).



Approximation at
\(x = 2\)
For changes in
\(\Delta x = 0.1\)
\(\frac{df}{dx} = \frac{1}{2}\,x^{-\frac{1}{2}} = 0.353553391\)
\(\frac{d^2f}{dx^2} = -\frac{1}{4}\,x^{-\frac{3}{2}} = -0.088388348\)
\(\frac{d^3f}{dx^3} = -\frac{3}{8}\,x^{-\frac{5}{2}} = -0.066291261\)
1st order approximation of \(\Delta f\)
\(\Delta_1 f = 0.353553391\cdot(0.1) = 0.035355339\)
2nd order approximation of \(\Delta f\)
\(\Delta_2 f = \Delta_1f + (\frac{1}{2}\cdot(-0.088388348)\cdot(0.1)^2) = 0.034913397\)
3rd order approximation of \(\Delta f\)
\(\Delta_3 f = \Delta_2f + (\frac{1}{6}\cdot0.066291261\cdot(0.1)^3) = 0.034924446\)



Based on [5].




Approximating the probability by drawings, approximating the expectation.






Simple, also easy to implement.


Can cope with problems in higher dimensions, for example, integration in higher dimensions.




Some results hold only in probability. (However, this probabilistic aspect can be removed or strongly reduced when moving from pseudo random numbers to quasi random numbers.)




The expectation of the random variable \(Z\) is the sum of each value in the event space multiplied by the probability of it occuring.


First problem to solve


Approximating probability by drawings.


We have the event value, but not the probability of it occurring. Quantifying the probability is the first step to solving the second problem.


The idea behind approximating the probability \(P\) by drawings is that when observing \(M\) drawings and calculating the running average of event \(\omega_i\) then as \(M\) gets larger the running average converges to the probability of the event \(\omega_i\) occurring.

\begin{array}{l} {\color{#28883A}\frac{1}{M}\sum_{k=1}^{M}1_{x_k=\omega_i}}=P(\{\omega_i\})\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ {\scriptsize\textsf{for}}\ M\to\infty\ \ \ \ \ \ \ \ {\textsf{(1)}}\\ \\ { \begin{array}{rl} x_k=\omega_i & {\scriptsize\textsf{Indicator function, returns 1 for }x_k=\omega_i\text{ and otherwise }0}\\ \omega_i & {\scriptsize\textsf{Event for which we want to approximate the probability}}\\ x_k & {\scriptsize\textsf{Sequence generated for event }\omega_i} \end{array} } \end{array}


Second problem to solve


Approximate the expectation.


An overview of the overall problem is that probabilities for a series of events are unknown and must be quantified. It would be good to be able to come up with probabilities not through the event probabilities, which we don't know and want to quantify, but through sequences that we generate (for specific events).


With \((1)\) we have a first step to this.


Out of the \(N\) events \(\omega_i\), with the value \(Z(\omega_i)\) multiplied with the probability of it given by \((1)\) we are almost \(\frac{1}{N}\scriptsize\text{th}\) of the way in direction expectation.


The expression on the right is the value of random variable Z from the generated sequence only for event \(ω_i\) because the indicator function returns factor 1 only for this event and 0 otherwise. That is, the expression on the right is value multiplied by probability, sourced from the sequence (for large \(M\)).


Summing up the expression on the right over all N events should then give the expectation of Z.

\[{\color{#28883A}{\LARGE\sum}_{i=1}^{N}} \frac{1}{M}\sum_{k=1}^{M}Z(x_k)\,{\color{#28883A}1_{x_k=\omega_i}}= \frac{1}{M}\sum_{k=1}^{M}Z(x_k)\,{\color{#28883A}{\LARGE(}\sum_{i=1}^{N}1_{x_k=\omega_i}{\LARGE)}}=\frac{1}{M}\sum_{k=1}^{M}Z(x_k) \]

Therefore, the expression on the right above or left below can serve as expectation of the random variable \(Z\) with probabilities over probability space \(P\).

\[\frac{1}{M}\sum_{k=1}^{M}Z(x_k)=\sum_{i=1}^{N}Z(\omega_i)\,P(\{\omega_i\})=\mathbb{E}(Z)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ {\scriptsize\textsf{for}}\ M\to\infty \]



Relationship between one or more unknown functions of one or more variables, the derivatives of these unknown functions, usually together with some given functions of the variables.


Generally, the unknown functions are physical quantities, and per definition the derivatives represent their rates of change, and the differential equation defines the relationship between them.


Ordinary differential equations (ODE) involve functions of one variable. Partial differential equations (PDE) involve functions of more than one variable.


Linear differential equations are linear in the unknown function and its derivatives. Non-linear differential equations are not. Most ODE are linear.


The order of a differential equation is based on the highest order of the derivative of the unknown function.



So, Black-Scholes is a 2nd order PDE. It is a type of 2nd order PDE, namely, convection-diffusion equation.


Finite difference methods are one way of numerically solving an ODE or PDE.



Risk in markets can be viewed as risk premium, which is a payment to investors above the return of a risk-free asset as compensation for them tolerating the extra risk associated with an investment.


Another view is that of the actual pricing of a risk-free asset. With more demand for safer assets, their price goes up and, consequently, their yield goes down, effectively increasing the risk premium.



Risk is inherent in all investment decisions, be it in the financial markets or elsewhere (for example, a company investing in a plant).


Risk is uncertainty, regardless of whether this uncertainty is in context of potential loss or magnitude of gains.


Not all risks are equal. Some are better than others. Risks can be combined to form new risk profiles, and to eliminate overall risk. Removing risk is called diversification, and a risk that can be diversified is diversifiable risk. Coming back to the above view, mentioning risk premium, it is important to understand that the market will not compensate for diversified risk in a contract.


Every financial transaction is, essentially, buying or selling risk. For example, an investor is buying undiversifiable risk, and his aim is to maximize risk for the level of risk that he is comfortable with.


The usage of the terms risk almost always implies undiversifiable risk.



Created 1952 by Harry Markowitz (1927-2023, economist), presenting a mathematical way of measuring investors’ risk tolerance and risk aversion by defining risk as the standard deviation from the mean (expected value).


The more risk-averse the investor is the lower the standard deviation he is comfortable with. With a risk neutral measure, the standard deviation is zero, meaning that the return is modelled to be the expected value of the asset’s returns.



Andrey Nikolaevich Kolmogorov (1903-1987, mathematician) introduced in 1930 the notion of probability space.


Probability is formalized in terms of a probability space (or probability triple), which is a mathematical construct for modelling a random process (experiment). It consists of three elements: sample space (set of all possible individual outcomes), event space (set of events, and an event can be simple -individual outcome- or complex -combination of individual outcomes- and is thus a collection of subsets of the sample space), and a probability measure (assigns each event in the event space a probability between 0 and 1).


A probability measure is a real-valued function on the event space that returns a value between 0 and 1, the unit interval \([0,1]\).




Independent samples \(1,2,\) and \(3\). Because the samples are independent, the probability of events is the sum of the probabilities of individual samples. Per definition of a probability measure, the probability assigned to the space must be 1.

Sample space
Event space
\(p_1=p_2=0.25, p_3=0.5\)

Probability measure

\begin{array}{rcl} p_{\tiny NONE}&=&0\\ p_1&=&0.25\\ p_2 &=&0.25\\ p_3 &=&0.5\\ \end{array}
\begin{array}{rcl} p_{12} &=&0.5\\ p_{13} &=&0.75\\ p_{23} &=&0.75\\ p_{123}&=&1 \end{array}

Among the key elements in probability theory are random (or stochastic) variables, probability distributions, and stochastic processes.


Random variables are functions. The domain of the function is the event space, and the codomain is a measurable space, often being real numbers. More formally, it is a measurable function (a function between underlying sets of two measurable spaces that preserves the structure of the spaces) between the sample space and another measurable space. This means that by transferring a measure from one measurable space to another using a measurable function (the random variable) a new thing (pushforward measure) comes into consideration, called a distribution, which is a probability measure on all possible values of the random variable.


It is possible for two random variables to have identical distributions but to differ in significant ways, for instance, they may be independent.


Random variable

Probability density function


This function is the first of two steps to retrieve the probability that the continuous random variable takes on a value between two boundary values \(a\), and \(b\).


The second step is to take the integral of the probability density function between the function values for \(a\) and \(b\), and this is the probability.


For example, here is the probability density function of the standard normal distribution.


Probability distribution function

= probability mass function 

= probability generating function


This function returns the probability that a discrete random variable has a certain value.

Cumulative density function


Same approach as for discrete random variables, only with the probability density function. That is, integrate the PDF from \(-\infty\) to the \(x\).


For example, here is the cumulative density function of the standard (that is, mean is 0, standard deviation is 1) normal distribution.


Cumulative distribution function


To find the probability that a discrete random variable takes on values \(\leq b\), sum up values of the probability distribution function for inputs up to and including \(b\). This is the function.

\[(F(x)=P[X\leq x]\]

The table below is from [11] and provides a very good, concise comparison of continuous and discrete random variables.

Random variable

The value of a continuous random variable falls between a range of values.

The value of a discrete random variable is an exact value.

The probability density function is associated with a continuous random variable.

The probability mass function is used to describe a discrete random variable.

A continuous random variable can take on an infinite number of values.

Such a variable can take on a finite number of distinct values.

Mean of a continuous random variable is \(\mu=\mathbb{E}[X] = \sum_{\scriptsize i}^{\scriptsize n}x_i\,p_i\).

The mean of a discrete random variable is \(\mathbb{E}[X] = \int_{\scriptsize-\infty}^{\scriptsize\infty}x\,f(x)\,dx\).

Examples of continuous random variables:

  • Uniform random variable
  • Exponential random variable
  • Normal random variable
  • Standard normal random variable

Examples of discrete random variables:

  • Binomial random variable
  • Geometric random variable
  • Bernoulli random variable
  • Poisson random variable

Mathematically, moments of a function are certain quantitative measures related to the location and shape of the function.


When the function is a probability distribution, then these four moments are considered: expected value, variance (a central moment, but non-standardized), skewness (a central and standardized moment, meaning it relates to shape, and is made scale-invariant by normalizing it), and kurtosis (also a central and standardized moment).

Name Moment Central Standardized

An ordinary or raw moment is about zero. That is, it relates to location of the distribution.


A central moment is the moment about the mean. Thus it relates to the spread and shape of the distribution.


Expected value is the first ordinary moment (about zero, that is, for location of the graph) and not a central moment. The central moment of the expected value (the first central moment) is zero.


Higher-order central moments, pertaining to the spread and shape of the distribution, are more useful than higher-order ordinary moments that relate to the location.


Non-standardized moments have a dimension, standardized moments do not.


Recap / overview: a raw moment relates to location of the distribution, a central moment relates to spread and shape of the distribution, and standardized moments allow for comparing shapes of different distributions.


Mode, median, and mean are different values, each attempting to provide a single, typical value —a measure of center— in or for a numeric dataset, serving as a 'summary' of it.


Expected value, the mean

\begin{array}{rlll} \mu=\mathbb{E}[X] & = & \sum_{\scriptsize i}^{\scriptsize n}x_i\,p_i & {\scriptsize\textsf{ utilizing probability distribution }} {\scriptsize p}\\ &&&\\ \mu=\mathbb{E}[X] & = & \int_{\scriptsize-\infty}^{\scriptsize\infty}x\,{\color{#E97132}f(x)}\,dx & {\scriptsize\textsf{ utilizing }}\color{#E97132}{\scriptsize\textsf{probability density function} } \scriptsize{\color{#E97132}{f(x)}}\\ \end{array}

A note on moment generating functions


Based on [8].


The mean locates the center, and the standard deviation describes the spread associated with the values of a random variable. But the point is that many different distributions (may) possess the same means and standard deviations. What is missing is a unique characterization of a distribution. And this is provided by the moment-generating function of a distribution.


Below is the definition of the \(k^{th}\) (ordinary, not central) moment of a random variable \(X\) taken about the origin.


The definition of the moment-generating function \(m(t)\) of the random variable \(X\) is given below.>/p>


It is said to exist if there exists a positive constant \(b\) such that \(m(t)\) is finite for \(|t|\leq b\). And if \(m(t)\) exists, then for any positive \(k\) the \(k^{th}\) derivative of the function \(m\) with respect to \(t\), that is \(\frac{d^k}{dt^k}\,m(t)\), at \(t=0\) is the \(k^{th}\) moment \(\mu_k^{'}\).


Various probability distributions have their own unique moment-generating function.


For example, for the normal distribution, \(m(t)=e^{t\mu+\frac{1}{2}\,t^2\,\sigma^2}\), the first derivative with respect to \(t\) is \((\mu+t\,\sigma^2)\,e^{t\mu+\frac{1}{2}\,t^2\,\sigma^2}\), and after setting \(t=0\), the first ordinary moment is \(\mu\). The second derivative with respect to \(t\) is below.


And after setting \(t=0\), the second ordinary moment is \(\mu+\sigma^2\).


Central moment


The \(n^{th}\) central moment, or \(n^{th}\) moment about the mean, is the expected value of the \(n^{th}\) power of the deviation of the random variable from the mean. (Per definition, the existence of a mean is required for central moments, meaning that random variables having no mean, such as the Cauchy distribution, have no central moments.)


The first central moment is zero.


Standardized moment


Standardized moments allow for comparing the shape (skewness, kurtosis, ...) of different probability distributions.


A standardized moment of a probability distribution is a moment that is normalized. Often it is an \(n^{th}\gt 1\) degree central moment that is normalized by the \(n^{th}\) power of the standard deviation, making the moment scale-invariant.

\[{\scriptsize\textsf{ n-th degree standardized moment}}=\frac{\mu_n}{\sigma^n}\]

Standard deviation


It is a measure of the amount of deviation of a random variable expected about its mean.


\(1^{st}\) degree standard deviation is the square root of the second moment.

\begin{array}{rcl} \sigma & = & \sqrt{\mu_2} = \sqrt{\mathbb{E}[(X-\mathbb{E}[X])^2]} = \sqrt{\mathbb{E}[(X-\mu)^2]}\\ &&\\ \sigma & = & \sqrt{\int_{\scriptsize-\infty}^{\scriptsize\infty}(x-\mu)^n\,{\color{#E97132}{f(x)}}\,dx}\\ \end{array}



Second central moment. It is not standardized.

\begin{array}{rcl} Var(X) & = & \mu_2 = \sigma^2 = \mathbb{E}[(X-\mathbb{E}[X])^2] = ... = \mathbb{E}[X^2]+\mathbb{E}[X]^2\\ &&\\ Var(aX) & = & a^2\,Var(X)\\ &&\\ Var(X+Y) & = & Var(X)+Var(Y)\ \ \ \ \ \ \scriptsize{\textsf{ if }}X{\textsf{ and }} Y{\textsf{ are independent}} \end{array}



Normal (or Gaussian) distribution occurs frequently in nature. It is defined by two parameters, mean and standard deviation. It has one peak, that is, it is unimodal, at the mean (the location information) and there is a symmetrical spread around this mean, measured by the standard deviation (the shape information).


Among the properties of this distribution is that \(68.3\%\) of it lies within \(\pm1\) standard deviation, and \(95\%\) lies within \(\pm2\) standard deviations.


A random variable with mean \(\mu\) and variance \(\sigma^2\) that has a normal distribution is denoted as \(X \sim N (\mu,\sigma^2)\).


Probability density function

\[f(x)=\frac{1}{\sqrt{2\pi}\,\sigma}\,e^{-\frac{(x-\mu)^2}{2\,\sigma^2}}\ \ \ \ \ \ \ \ {\scriptsize\textsf{with } -\infty \lt X \lt +\infty \textsf{ , } -\infty \lt \mu \lt +\infty \textsf{ , } \sigma \gt 0}\]

If \(\mu=0\) and \(\sigma=1\), then it is a standard normal distribution.


Cumulative density function


Moment-generating function

\[m(t) = e^{t\mu+\frac{1}{2}\,t^2\,\sigma^2}\]



Good exercise from [8] to calculate the mean of a distribution, with PDF \(P(x) = \frac{1}{\sqrt{2\pi}}\,e^{-\frac{x^2}{2}}\).

\begin{array}{rclr} \mu & = & \int_{\scriptsize-\infty}^{\scriptsize\infty}\,dx\,x\,P(x) &\\ & = & \frac{1}{\sqrt{2\pi}}\,\int_{\scriptsize-\infty}^{\scriptsize\infty}\,{\color{#E97132}dx\,x}\,e^{\color{#E97132}-\frac{x^2}{2}} &\\ & = & {\color{#E97132}-}\frac{1}{\sqrt{2\pi}}\,\int_{\scriptsize-\infty}^{\scriptsize\infty}\,{\color{#E97132}du}\,e^{\color{#E97132}u} & \scriptsize{\textsf{ with }} {\color{#E97132}u} = -\frac{x^2}{2}{\textsf{, so that }} {\color{#E97132}du} = {\color{#E97132}-}x\,dx\\ & = & {\color{#E97132}-}\frac{1}{\sqrt{2\pi}}\,e^{\color{#E97132}u} &\\ & = & -\frac{1}{\sqrt{2\pi}}\,e^{-\frac{x^2}{2}} &\\ & = & -\frac{1}{\sqrt{2\pi}}\,(0-0) &\\ & = & 0 & \end{array}



Based on [7].


Standard deviation is the root-mean-square of the deviations of a random variable from the mean.

Power mean

\[{\scriptsize{\textsf{PowerMean}}}_p = \left( \frac{a_1^p+a_2^p+...+a_n^p}{n}\right)^\frac{1}{p}\]

Arithmetic mean


Root mean square


Hamonic mean


Arithmetic mean and root-mean-square work also with zero or negative numbers \(a_i\).

\(P \gt Q\)


not all \(a_i\) equal

\[{\scriptsize{\textsf{PowerMean}}}_P \gt {\scriptsize{\textsf{PowerMean}}}_Q\]

all \(a_i\) equal

\[{\scriptsize{\textsf{PowerMean}}}_P = {\scriptsize{\textsf{PowerMean}}}_Q\]

Root-mean-square can also be used for continuous functions, with integration replacing summation. The root-mean-square value of \(f(x)\) over the interval \(a \leq x \leq b\).


Root-mean-square: "It is useful when trying to measure the average 'size' of numbers, where their sign is unimportant, as the squaring makes all of the numbers non-negative."


For positive numbers

\[{\scriptsize{\textsf{arithmetic mean}}} \geq {\scriptsize{\textsf{geometric mean}}} \geq {\scriptsize{\textsf{harmonic mean}}}\]

For positive numbers, the arithmetic mean and the geometric mean (\(g\) below) are related by the following equation for the logarithm to any base \(k\). The mathematical derivation is straightforward and not a surprise, yet the meaning is revealing: The log to any base \(k\) of the geometric mean of \(a_1...a_n\) is the arithmetic mean of the logs to base \(k\) of \(a_1...a_n\).

\begin{array}{rcl} g & = & (\,a_1\,a_2\,...\,a_n\,)^\frac{1}{n}\\ \log_k(g) & = & \frac{1}{n}\left(\,\log_k(a_1)+\log_k(a_2)\,...\,\log_k(a_n)\,\right) \end{array}