# Combining Randomized Quasi Monte Carlo (Sobol) and Parallel Processing (Multithreading) when Pricing Derivatives in Excel

The beauty of *Monte Carlo* simulation is that it can be used to price any European financial derivative contract, of which the terminal payoff is expressed as a function of **D** terminal underlying factors by simulating the terminal values of these factors as of the contract's maturity date.

It turns out, the successful execution of a *Monte Carlo* simulation requires the prior specification of two elements:

- 1.The payoff as a function
**f**that maps**D**real numbers to one real number, i.e. a function**f: R**^{D}->**R**. - 2.The rules that govern the generation of the simulated scenarios (values) of the
**D**terminal underlying factors. These rules include the specification of the number**N**of scenarios, the method by which the random numbers are generated, but also the number of parallel threads on which the simulation runs in a multithreading environment.

The Excel Deriscope add-in contains spreadsheet formulas that produce the above two elements in the form of memory-lived persistent objects.

These objects are accessed by the user through corresponding spreadsheet-displayed text labels that serve as the objects' unique handles.

While this object-oriented approach is nothing new, Deriscope's integrated wizard is indeed novel, because it is capable of dynamically generating all spreadsheet formulas that are needed in any given context. Without the wizard, this job would require an enormous effort on the part of the user.

A detailed example of how one may use the wizard to build a spreadsheet that prices the *Trigger Plus* note issued by Morgan Stanley on April 1, 2019 can be found in my previous article.

There I applied two variants of the *Monte Carlo* technique that differ from each other on how the underlying "random" numbers are generated. In both cases the generated *random numbers* – also more properly referred as *pseudo random numbers* from the Greek word "ψεύδος" meaning "lie" - are deterministically produced by a certain algorithm so that they all lie in the interval (0,1).

Recommended for Deriscope starters: The Overview and Quick Guide pages.

### Pseudo and Quasi Random Number Generators

__1.__The collection of the generated "random" numbers closely resembles the dots produced by blindfold throwing a dart on a linear segment of unit length.__Pseudo Random Number Generator__:- QuantLib uses several different generators, but primarily the Mersenne Twister algorithm developed in 1997 by Makoto Matsumoto and Takuji Nishimura with the very long period of 2
^{19937}-1, meaning that the current state of the generator is stored in an array of 19,937 bits. - The main disadvantage of this generator is the lack of an easily implementable "jump ahead" method that would allow the simultaneous creation of independent random number subsequences, which are necessary for running a simulation on parallel threads. Due to this deficiency, Deriscope runs simulations based on
*Pseudo Random*numbers always on a single thread. __2.__The collection of the generated "random" numbers closely resembles the positions of particles constrained on a linear segment of unit length that are all positively charged and therefore repel each other.*Quasi***Random Number Generator**:- QuantLib uses primarily the
*Sobol*implementation, named after its invention by the Russian mathematician*Ilya Meyerovich Sobol*in 1967. - Multithreading is supported by the existence of a "jump ahead" method.
- Its main disadvantage is that the produced numbers are not conformant with what one would expect from a random variable, a fact that makes the direct computation of an estimated standard deviation impossible.
- Deriscope's workaround is based on the technique of
*randomization*that consists of cloning a given grid of*Sobol*points to another independent one by applying a random*digital shift*on the involved integers. This special*shift*is implemented by XORing the Sobol integers with a*Mersenne Twister*random integer.

It is well known that for the same number of generated "random" numbers, the *Quasi Monte Carlo* technique – also known as *Low Discrepancy Monte Carlo* - leads to a better estimate of the true – but unknown – average of the financial quantity that depends on these numbers than the *Pseudo Random Monte Carlo*.

The inferiority of the *Pseudo Random Monte Carlo* technique is due to the non-uniform coverage of the unit interval with "random" numbers (more details below). Therefore, heavily congested segments of the unit interval contribute unfairly more to the average than other relatively vacant areas.

This is also certified by the numerical results in the article mentioned, where the *Quasi Monte Carlo* pricing converged much faster to the known exact result than the *Pseudo Random Monte Carlo*, as the following chart betrays:

It is anyway a mathematical fact that the *Pseudo Random Monte Carlo* convergence rate is of the order of **O(N ^{-½})** versus the much faster

*Quasi Monte Carlo*convergence rate of

**O((logN)**- at least under certain conditions -, where

^{ ½}/N)**N**is the number of scenarios produced by the simulation.

In layman terms, the above rates mean that the *Pseudo Random Monte Carlo* *standard error* is inversely proportional to **√N**, whereas the *Quasi Monte Carlo standard error* is almost – ignoring the slower **(logN) ^{ ½}** term - inversely proportional to

**N**.

In even more layman terms, multiplying the simulation's number of scenarios by a factor of **100**, for example changing an initial **N = 100,000** (quite standard in certain actually used contexts) to **N****΄**** = 10,000,000** (often considered very big and impractical), will result to only a **10** times smaller *standard error* under *Pseudo Random Monte Carlo*, but to an almost **100** times smaller (!!!) *standard error* under *Quasi Monte Carlo*!

What was not addressed in the previous article, was the practical estimation of the standard error associated with the *Quasi Monte Carlo *method.

While the *standard error* in the *Pseudo Random Monte Carlo* case can be readily estimated by calculating the average of the squared differences of the simulated values from their average, no similar methodology can be applied when the simulated values are the result of *Quasi Monte Carlo*. This is due to the fact that the "random" numbers generated in the course of a *Quasi Monte Carlo* simulation cannot be regarded as the results of **i.i.d.** (*i**ndependent identically distributed*) trials and therefore the whole statistical apparatus supporting the derivation of the standard error formula breaks down.

The purpose of this document is to show how one may still compute in Excel a *standard error* associated with a *Quasi Monte Carlo* simulation by using the well-known *Randomized Quasi Monte Carlo *technique.

It also demonstrates the incorporation of *parallel processing*, which brings* *the advantage of a considerable computation speed-up in a multithread environment, which is by the way standard nowadays.

But first, let us better understand the two random number generators by studying their produced "random" numbers.

### Visualizing Pseudo Random and Sobol Number Sequences in One Dimension

One may think of using the built-in Excel **RAND** formula to display a series of random numbers between 0 and 1 in the spreadsheet.

Unfortunately, this would be a likely poor choice because the **RAND** formula evokes the *Mersenne Twister* random number generator with a clock-based seed. The seed is just an integer that affects the initial values of the *Mersenne Twister* parameters, which in turn generate the final series of random numbers.

It follows that two nearby cells containing the formula **=RAND()** will evoke two separate *Mersenne Twister *random number generators initialized with different seeds that will accordingly generate different sequences of random numbers. Each **RAND** formula outputs the first number of the respective sequence. Therefore, the two displayed numbers in the nearby cells are different, __not because__ they are two separate elements of the same sequence of random numbers, __but because__ they are the first elements of two unrelated sequences. So, these two numbers look random, but are not truly related to each other as two random numbers ought to!

It is likely that the newly introduced **RANDARRAY** formula returns a sequence of truly random numbers, but its seed is still clock-based, which makes replication of the results a bit impractical.

Another built-in alternative is to make use of the vba **RND** function that takes a seed input and returns a proper sequence.

Below, I am going to access the *Mersenne Twister* *Random Number *generator through a Deriscope formula, that allows specifying the seed and therefore produces the exact same sequence of numbers every time it runs.

I will depict the numbers as dots on a straight line of unit length and I want to see how these dots get allocated as their total number increases from **1** to **15**.

Then I will do the same with numbers generated from a *Sobol* *Quasi* *Random Number *generator and compare the two sets of numbers.

The construction takes two steps.

The first step is the creation of an object of type *Model<Simulation>* and the next step is the invocation of that object's local function **Random Generator Values**.

The following short video shows the mouse clicks needed in the first step:

Below the created spreadsheet formula is shown, where I have added for readability purposes the two extra key-value pairs **Handle= Random1D** and **Random Generator= Pseudo Random**

The second step is to create the spreadsheet formula that outputs the random numbers generated by the above object.

The next video shows how I select the appropriate function from the *Function Selector* of the Deriscope wizard and have it pasted in the spreadsheet by clicking on the **Go** button:

As shown below, the wizard has pasted the array formula {=ds(B20:C22)} in the output range **B10:B19**, which consists of **10 rows** because of the key-value pair **Scenarios=10** that dictates the generation of a simulated sample consisting of **10 **scenarios.

Each sample element, i.e. each scenario corresponds to one random number in the interval **(0,1)** because the dimensionality has been set to **1** by the key-value pair **Dim=1**. This is the reason why the output range consists of one column only.

I will examine higher dimensions afterwards.

Next, I want to see what values are generated by a *Quasi Random Generator* of the *Sobol* type.

While I could repeat the previous steps with the wizard, it is much more sensible to exploit Excel's unparallel capacity for manipulating formulas and data by simply copying and pasting everything shown to its immediate right, as this video shows:

Below is the state of my spreadsheet after the copy and paste operation and after I have manually entered the shown values in cells **F5** and **F6**:

The difference between the two sequences of "random" numbers is striking!

The *Sobol* numbers seem to follow some ordering principle, which I will explain shortly.

The easiest way to understand the structure of the *Sobol* numbers is by plotting them as a function of the number of scenarios, i.e. the *sample size*.

It is very simple to set up the Deriscope spreadsheet formulas that are needed so that their output to be directly presentable as an Excel chart.

The produced charts for both the *Random* and *Sobol* cases are shown below.

You may find the source formulas as part of the downloadable spreadsheet at the end of this article.

The last chart explains the very simple algorithm that creates the *Sobol* numbers.

The idea is to bisect any existing empty intervals (i.e. intervals that contain no dots) by placing a new dot exactly at their middle point.

Initially, when no dots exist, there is only one empty interval, the **(0,1)**.

Then the following steps are undertaken:

__Step 1: Generating the first Sobol number 0.5: __

The first *Sobol* number is chosen at exactly the middle of the initial empty interval **(0,1)**, which is the number **0.5**, shown as a red dot in the middle of the green line at the bottom.

__Step 2: Generating the second Sobol number 0.75:__

The insertion of the previous dot creates two new equal size empty intervals, the **(0,**** ****½)** and **(½,1)**.

According to the bisection rule, these two empty intervals must be bisected by placing new dots at their middle.

First the right empty interval **(½,1)** is bisected by placing a new dot at its middle at **0.75.**

__Step 3: Generating the third Sobol number 0.25:__

Then the left empty interval **(0,**** ****½)** is bisected by placing a new dot at its middle at **0.25.**

__Steps 4-7: Generating the 4 ^{th} to 7^{th} Sobol numbers:__

By the end of **Step 3**, a total of **4** empty intervals exist.

Therefore, the next bisection round must halve all these **4** intervals by inserting **4** dots at their middle.

This takes **4** steps, by the end of which, the following new numbers have been added:

0.125

0.375

0.625

0.875

__Important:__

A sample consisting of **7** Sobol numbers is special in the sense that all these numbers are distributed uniformly across the unit interval.

As you see at the **7 ^{th}** green line from the bottom, the

**7**red dots are separated from each other by the same distance of

**0.125**.

__Steps 8-15: Generating the 8 ^{th} to 15^{th} Sobol numbers:__

By the end of **Step 7**, a total of **8** empty intervals exist.

Therefore, the next bisection round must halve all these **8** intervals by inserting **8** dots at their middle.

This takes **8** steps, by the end of which, the following new numbers have been added:

0.0625

0.1875

0.3125

0.4375

0.5625

0.6875

0.8125

0.9375

**Important:**

A sample consisting of **15** Sobol numbers is special in the sense that all these numbers are distributed uniformly across the unit interval.

As you see at the top green line, the **15** red dots are separated from each other by the same distance of **0.0625**.

The bisection rule is closely related with the usual implementation of the *Sobol* algorithm in terms of bitwise operations on the bit arrays that correspond to the **base 2** representations of unsigned integers.

Note there also exist other Low Discrepancy algorithms that work with a base other than **2**.

The important aspect of the *Sobol* algorithm is that the dots cover symmetrically the unit interval only when their number is one of **1, 3, 7, 15, … **or **2 ^{n}-1**, where

**n**is any positive integer.

This means that the sample size in the *Sobol* case ought to equal **2 ^{n}-1**, in order to obtain the most fairly distributed set of numbers in

**(0,1)**, in the sense that the chosen numbers form an equidistant grid with a constant gap of size

**1/2**between any two consecutive points.

^{n}Note that the *Pseudo Random Generator* imposes no such restriction on the sample size.

By default, Deriscope issues a warning if one attempts to run a *Sobol* simulation with a sample size **N** that does not observe the above rule.

The power of *Sobol* over *Pseudo Random* is apparent by the following chart where the dots produced by the two *Random Number Generators* are placed side-by-side for a sample size of **127**.

The dots in the *Sobol* case are much more uniformly distributed over **(0,1)** and therefore lead to a much fairer representation of that interval's numbers.

As mentioned above, they form an equidistant grid with a constant gap of **1/128**.

So, **127** is the optimal sample size among numbers in the closed interval **[64,254].**

If we had used a lesser sample size, such as **126**, then the minimum gap between consecutive points would still be **1/128**, but there would also be occurrences of gaps wider than that.

Increasing the sample size by **1**, i.e. setting it equal to **128**, introduces a new smaller gap between two of the points, the width of which is half the **1/128**, i.e. a gap of **1/256**. This smaller gap will apply to all points only when the sample size becomes much higher and equal to **255**.

### Visualizing Pseudo Random and Sobol Number Sequences in Two Dimensions

Generating samples containing two dimensional vectors of random numbers is required when one needs to simulate the evolution of two – perhaps correlated – factors.

This need arises, for example, in the *Monte Carlo* pricing of rainbow options, where the option's payoff depends on the terminal prices of two underlying stocks.

In order to see the respective random arrays in the spreadsheet, all I have to do is change the value for the key **Dim** from **1** to **2**, as shown below:

As you see, the array spreadsheet formula {=ds(E20:F22)} now returns a two-column range of numbers. The two numbers in each row represent the coordinates of the corresponding generated *Sobol* point in a two-dimensional unit square.

Similar holds for the array spreadsheet formula on the left with regard to the *Pseudo Random Generator*.

As I did in the one-dimensional case, the easiest way to understand how the *Sobol* pairs evolve is by displaying the **2D** charts that correspond to the various sample sizes. Below you see the produced charts. I have not plotted the *Pseudo Random* case, since it poses no challenges.

The important – and quite surprising – lesson from these charts is that the two-dimensional Sobol dots follow the same periodicity as the one-dimensional, in the following sense:

A similar rule regarding the bisection of empty intervals in the two coordinate axes seems to apply, but not all coordinate combinations are considered when it comes to form the geometrical points.

For example, going from **Sample Size = 1** to **Sample Size =** **2** and **3**, the two new points possess horizontal and vertical coordinates that involve the numbers **0.25 **and **0.75**, which are the middle points of the two intervals **(0,**** ****½)** and **(½,1)** on both axes.

But only the two points along the downward sloping diagonal line with coordinates **(0.75,**** 0.25)** and **(0.25,0.75)** are considered.

Interestingly enough, the remaining two points along the upward sloping diagonal line with coordinates **(0.75,**** 0.75)** and **(0.25,0.25)** are left out!

This procedure does not treat the two axes in a symmetrical fashion, but it has the advantage that each new round of bisection – and therefore finer spacing of dots – follows the same pattern as that in the one-dimensional case, i.e. occurs when the sample size exceeds the number of the form **1, 3, 7, 15, … **or **2 ^{n}-1**, where

**n**is any positive integer.

This fact turns out to be very important in higher dimensions.

For example, in a space of **D** dimensions, where **D** is an even number greater than **6**, the first point will be at the center of the **D**-dimensional unit hypercube having the **D** coordinates **(0.5, 0.5, …, 0.5)**.

The second point turns out to be at the middle of one of the diagonal lines with coordinates **(0.75, 0.25, 0.75, 0.25, …, 0.75, 0.25)** that alternate between **0.75** and **0.25**.

The third point will be the reflection of the second point through the center of the hypercube, located at **(0.25, 0.75, 0.25, 0.75, …, 0.25, 0.75)**.

Just like in the two-dimensional case, no other points will be created using a different sequence consisting only of **0.25** and **0.75**.

This is a good thing!

Imagine instead that a Sobol procedure treated all **D** axes the same.

Then all possible sequences would have a right to exist, which would also make sense from a pure geometric view in terms of dispersing the points in the most homogeneous fashion.

The number of all possible sequences with **D** elements being either **0.25** or **0.75 **equals **2 ^{D}** or – approximately –

**10**!

^{D/3}The above number is already huge when **D** is as low as **18**. Then the first **10 ^{18/3}**

^{ }or

**10**or

^{6}**1,000,000**Sobol points would all fall on the surface of a sphere centered at

**(0.5, 0.5, …, 0.5)**and having a radius of

**0.25.**

A simulation based on such *Sobol* points would hardly give any meaningful results, since all the simulated factors would only assume the two possible numbers corresponding to the Sobol quasi random coordinates **0.25** and **0.75**.

The two charts below show the dots produced by the two *Random Number Generators* for a sample size of **127**.

Like in the one-dimensional case, the dots in the *Sobol* case are much more uniformly distributed over the** **unit square.

An interesting observation regarding the *Sobol* grid of points is the following:

No two points are on the same vertical or horizontal line.

Therefore, the **x** coordinates of all the **127** points are distinct numbers and turn out to be the same with the **127** numbers we encountered in the **1D** case.

The same applies for the y coordinates, from which follows that the **127** **x** coordinates are __exactly the same__ with the **127** **y** coordinates!

This is hard to infer from a casual look at the below image, but it is a fact that applies to any sample size and any dimension!

Even if you built **127** *Sobol* points in **1,000** dimensions, you would only need **127** distinct numbers. The **127** coordinates of all these points on any of of the **1,000** axes will form the same constant set of numbers, regardless of the chosen axis.

### Visualizing Pseudo Random and Sobol Number Sequences in Three Dimensions

Getting from **2** to **3** dimensions is achieved by changing the value for the key **Dim** from **1** to **2**, as shown below.

The recent Excel support for "dynamic arrays" that are formulas entered in one cell but "spill" their output automatically over the so called "spill range" leads to the generation of **3-dim** *Sobol* numbers without the need of cumbersome manual adjustments of the involved array formulas.

The array spreadsheet formula { =ds(F20:G22)} now returns a three-column range of numbers. The three numbers in each row represent the coordinates of the corresponding generated *Sobol* point in the 3-dimensional unit cube.

The array formula on the left does the same with regard to the *Pseudo Random Generator*.

It is still possible to visualize how the *Sobol* pairs evolve in 3-dimensional space by displaying the **3D** charts that correspond to the various sample sizes. Below you see the produced charts as part of a video for better visual resolution. You may pause the video to study any specific frame.

The analysis presented before in the 2-dimensional case is carried here unchanged.

Even though the dimensionality now is higher, the optimal sample sizes are still **1, 3, 7, 15, …,** **2 ^{n}-1**.

Below is the plot of the first **127 3D** *Pseudo Random* points in a **3D** cube of unit size:

Compare the above with the plot of the first **127 3D** *Sobol* points shown below:

###
Pricing a Stock Option using One-Dimensional Pseudo Random Simulation

We can study the one-dimensional Pseudo Random Simulation by pricing a plain vanilla *European call option* on a stock using Monte Carlo, because then the exact price is known from the *Black Scholes* formula.

The latter – in the case of a *European call* - is a function **ƒ** of **6** variables:

**Call Price = ƒ(S, K, σ, r, ****δ****, t) = e⁻**^{δ}^{t}** S N(d₁) - e⁻ ^{rt} KN(d₂)**

where

**S** = spot stock price

**K** = agreed strike

**σ** = assumed (implied) volatility of the stock price

**r** = continuously compounded annualized interest rate for the time t

**δ** = expected continuously compounded annualized dividend yield until the time t

**t** = time to the option's expiry in annual units

**d₁ = ln(e ^{(r}⁻**

^{δ}

^{)t}**S/K)/(**

**σ**

**t**

^{½}) + ½**σ**

**t**

^{½}**d₂ = d₁ - σt ^{½}**

While one may use built-in Excel functions to calculate **ƒ** in the spreadsheet, it is easier to use the wizard to paste the formula that creates an object of type *BS Fn*, which encapsulates the function **ƒ**. This short video shows how I do just that:

Here is how the created *Black Scholes* function object (object of type *BS Fn*) looks like on my spreadsheet:

Even though the formula *=ds(B3:C4)* in cell **B2** takes no input other than the **Type** and **Function** keys, the contents of the created object displayed in the wizard indicate that **Payoff Type= Vanilla** and **Direction= Call**.

This is so because these two key-value pairs are optional and assume the shown default values when they are not part of the input.

The two keys shown at the bottom, **_Domain Dim= 6** and **_Range Dim= 1**, indicate what has already been mentioned, namely that the Black Scholes function **ƒ** is a function of **6** variables, i.e. it maps **R ^{6}** to

**R**.

I can now evaluate the *Black Scholes* function for a given set of its input variables by calling its local function **Evaluate**. As usually, I let the wizard paste the formula for me:

The result of **7.0829** is shown below. Note that the shown numbers **95, 100, 0.2, 0.04, 0, 1** are parsed using the fixed order **S, K,** **σ, r, ****δ****, t**.

My next task is to calculate the price of the same European call option using *Monte Carlo*, so that I can see if and how the result is going to converge to **7.0829** as I increase the *sample size* (number of scenarios).

Before I can price the mentioned option using Monte Carlo, I first need to create the corresponding Deriscope object of type *Structured Note*.

Since it doesn't matter what the option's underlying is, I will an underlying of type *Stock*. The following video shows how I do that using the wizard.

Below you see that the wizard has pasted two formulas in the spreadsheet.

The first formula is =ds(J11:K14) in cell **J10** that creates the object **&Linear1D.1** of type *Linear Fn*.

This object is then used as input to the main formula =ds(J3:K8) in cell **J2** that creates my final object **&Tradable1D.1** of type *Structured Note*.

The wizard has chosen an object of type *Linear Fn* by default to represent the required payoff function expected by the key **Payoff Fn** in cell **J7** because this type is one of the simplest types that represent payoff functions.

Due to its coefficient of **1** in cell **K13** the created object represents the identity function that maps the variable **x** to itself.

In other words, this structured note is designed to pay **x**, if the price of the stock underlying **%RDSB.L|GB** on the maturity date of **07 Aug 2020** happens to be **x**.

If I had tried to price this note using *Monte Carlo*, the result would converge to the spot price of **95**, provided that zero dividends were assumed.

But I am after the pricing of a *European call option*, the payoff function of which is more interesting than the current identity payoff function, in the sense that the option's price is also sensitive to the assumed values for the **Eval Date** in cell **K8** and the discounting interest rate.

I want therefore to replace the object **&Linear1D:1.1** with another one that represents the function **ƒ(x) = max(x-K,0)**, which is the payoff function of the European call at expiry, where **x** is the realized stock price at expiry.

Omitting the video with the intermediate wizard steps, the final generated formulas look so:

Selected cell above is the cell **J10** that contains the formula =ds(J11:K14) and returns the object **&CallFn.1** of type *Payoff Fn*, of which the contents are visible in the taskpane on the right.

This object represents a function that maps one real variable **x** to another real variable according to the formula associated with the *Payoff* object supplied in cell **K14**.

The latter is created below in cell **J16** and has obviously the desired payoff function **x -> max(x-K,0)**, with **K** being the *Strike*, which is set to **1** in cell **K22**.

In order to make my structured note identical with a European call option with strike **100** and expiring in **one year** from today, I should manually change the **Eval Date** in cell **K3** from **07 Aug 2020** to **06 Aug 2020**, so that I get exactly **365** days when counted from today (**07 Aug 2019**).

I should also change the strike in cell **K22** from **1** to **100**.

After these changes, I ask the wizard to paste the pricing formulas, as shown in the short video below:

Below you see the main formula pasted by the wizard in cell **M2**.

It could have returned the price as a number, but instead it returns the object **&Variant_M2:1.1**.

This is due to my choice **Output= Full** in cell **N7**.

The advantage of this choice is that the result of the *Monte Carlo* calculation includes a few more data of interest, which are visible in the taskpane on the right.

The final price is the **Value= 7.529**.

Very important is also the **Std Error = 2.253**, which is huge due to the fact that the default sample size chosen by the wizard in cell **N18** is only **10**.

The objects containing the market data are located in lower rows, as seen below.

With green color are the values that I have already linked to the corresponding cells used in the evaluation of the *Black Scholes* formula.

If I click on the lens sign next to the **Sim Report**, I see a few more details about the simulation, as shown below:

This tells me, for example, that the non-discounted average of the simulated terminal structured note values is **7.836**, their standard deviation is **7.415**, the estimated standard deviation of their average is **2.345** and that the simulation ran in a non-parallel fashion thus utilizing only one thread.

The entries **Randomized Runs** and **Randomized Averages** apply only when the simulation is of the *Sobol* type.

The cell next to **Simulated Values** appears empty because I have not explicitly requested that the terminal stock prices and their respective structured note payoff values are collected and reported across all scenarios.

I can set up such a request as follows:

Currently the specs for my simulation are defined by the object **&SimMdl_M14:1.1** created in cell **M14**, as shown below.

Since the cell **M14** is selected, the wizard displays the contents of the respective object.

The dimmed keys correspond to optional keys that retain the shown value when they are not explicitly supplied on the spreadsheet.

The one at the bottom is called **Simulated Values** and currently has the boolean **FALSE** value, as indicated by the unchecked box on its right.

All I need to do, is manually insert the key-value pair **Simulated Values= TRUE** in the spreadsheet, within the input range of the formula that creates the **&SimMdl_M14:1.1** object.

If I do that and go back to my previous screen for the **Sim Report**, I will see the following:

I may now click on the lens sign to get the following screen:

The second column contains the terminal stock prices **x** at expiry as they are produced by the respective **10** scenarios during the course of the simulation.

The third column contains the corresponding evaluated payoff function **max(x-100,0).**

The price of the structured note is simply the average of the shown prices on the right column, multiplied by the discount factor as of the expiry date.

For efficiency reason, it is better to have **Simulated Values= FALSE** when the number of scenarios is very big.

There are two more adjustments I need to make, in order for the Monte Carlo price to converge to the exact Black Scholes result as the number of scenarios increases.

The first is inserting the optional key-value pair **Vol Time DC= %ACT/365F** in the input range of the formula that creates the **&SimMdl_M14:1.1** object in cell **M14**.

The second is inserting the optional key-value pair **TS Daycount= %ACT/365F** in the input range of the formula that creates the yield curve object in cell **M39**.

These adjustments are needed so that the time interval in annual units between today (**07 Aug 2019)** and expiry (**06 Aug 2020)** is evaluated as being exactly equal to **1**, since this is the time interval used in the Black Scholes formula that calculates the exact price.

Now I am ready to set up my final table, which should report how well the simulation results approach the known exact *Black Scholes* price of **7.0829**.

Since I want to also compare the results against those from a *Sobol* simulation, I am going to set up the sample sizes so that they follow the sequence **1, 3, 7, 15, …, 2 ^{n}-1**, as has already been explained.

Accordingly, the left two columns will look like below, where the sample size **N** goes from **1** to **1,048,575 = 2 ^{20}-1**:

Although straightforward in principle, the remaining steps could be complex, due to the fact that each sample size **N** should be first used to create a corresponding object of type *Model[Simulation]*. Then the latter object should be used to create a corresponding object of type *Model[Structured Note]*. Finally, the pricing formula for each **N** should be called in a way that it references this last object.

All these intermediate objects can be easily created by using the special Deriscope **Clone** function, which replicates any given object while simultaneously changing one or more of the data this object hold.

But in the current case, I can make use of the key-value pair **Edit Input= TRUE** that may be optionally supplied to the **Price** function.

This allows me to add the following 3 key-value pairs:

- 1.Key
**Ref Objects**that expects one or more objects of any type that are involved in the pricing. - 2.Key
**Ref****Keys**that expects the keys of the objects specified above, of which the associated values I wish to modify. - 3.Key
**Ref****Values**that expects the new values that should replace the currently existing values that are associated with the keys specified above.

In effect, I can compute in a single step the price that results from a change anywhere in the input data!

The image below shows the application of this technique in calling the **Price** function with a modified value of **100** for the key **Scenarios** of the reference object **&SimMdl_M14:1.1**:

All I need, is enter formulas in the cells of column **R** that reference the range shown above, except the bottom-right cell, which must be replaced with a link to the corresponding cell of column **Q**.

So, the formula in the top cell **R3** should look as follows:

This formula may now be copied all the way down.

The remaining **3** columns are supposed to display the **Value**, **Standard Error** and **Processing Time** that are contained in the respective result objects of column **R**.

The easiest way is to use the wizard to paste the first formula in cell **S3**, as this short video shows:

Then the formula may be pasted downwards and to the right to cover the whole range to get the following final result:

### Pricing a Stock Option using One-Dimensional Sobol Simulation

Next, I want to build a similar table using a *Quasi Random Number Generator* of the *Sobol* type.

As I have already mentioned, the *Sobol* algorithm supports multithreading, which I will obviously – and ruthlessly - exploit, since my pc has **8** logical processors.

My first step consists of creating a new object of type *Model[Simulation]* that involves the *Sobol* generator and the multithreading requirement.

Below you see that object created in cell **W13**, with the new key-value pairs **Random Generator=** **Low Discrepancy** and **Exec Policy=** **&Parallel.1**.

Without further ado, following similar steps as before, the table with the price dependence on sample size under *Sobol* simulation is shown below:

### Comparing the Pseudo Random and Sobol Simulation Results in One Dimension

A first observation is that the processing times for the *Sobol Monte Carlo* are much shorter than those for the *Pseudo Random Monte Carlo*, due to parallel processing supported by the former but not by the latter.

The second observation is that the *Sobol*-generated prices seem to converge much faster to the exact price of **7.0829**.

The following chart plots the two sets of prices against the sample size **N**, side by side:

This chart is effectively in logarithmic scale on the x axis, since the depicted variable is the logarithm base 2 of N+1.

Regarding a price of **0.01** as one basis point (**bp**), we observe that under *Pseudo Random Monte Carlo* we need well over **1,000,000** scenarios to reach a *standard error* that approaches **1bp**.

Unfortunately, no *standard error* can be computed under *Sobol Monte Carlo*.

Nevertheless, the above table indicates that even the price **7.0713** reached with a sample size of **4,095** is almost within **1bp** from the known exact price of **7.0829**.

One may thus wonder if the *Sobol* sample size of **4,095** is sort of equivalent to a *Pseudo Random* sample size of **1,048,575**.

In fact, it turns out that it is even better!

It is possible to estimate the *standard error* associated with any given *Sobol* sample size **N** by running a so called *Randomized Sobol Monte Carlo*.

This consists of running **R** separate Sobol sub-simulations, each of sample size **N**, whereby the generated *Sobol* numbers between **0** and **1** in each sub-simulation are "shifted" in a special fashion by some random amount.

Each such sub-simulation leads to a corresponding price for our tradable.

The end effect is that the collection of these **R** prices can be regarded as a sample retrieved through R i.i.d. trials of some unknown random variable.

This assumption makes it legible for us to use the usual formulas for the sample-implied estimated standard deviation of that unknown random variable.

A typical choice for **R** is a number between **50** and **100**.

Since the **R** sub-simulations are distributed in **8** parallel threads – in my pc at least – it makes sense to set **R** to a multiple of **8** so that each thread assumes the same load.

I will thus use **R = 64**.

In the image below, I use the **Clone** function to modify the existing **&Sob1D.1** object by setting new values for its keys **Scenarios** and **Randomized Runs**.

Proceeding like before, I use the above object to set up the table shown below, where the last column contains the standard deviation associated with the **64** tradable prices generated by each *Randomized* multi-simulation against the sample size **N**.

The **Total N** column contains the total number of scenarios generated by each multi-simulation.

The result supports my earlier suspicion that the *Sobol* **N = 4,095** is indeed "equivalent" with – and actually better than – the *Pseudo Random* **N = 1,048,575**.

Indeed, the highlighted row above shows that the *Randomized Sobol* simulation leads to an estimated standard error of **0.0085** for **N = 4,095**, whereas the corresponding standard error in the *Pseudo Random* case was found to be much higher at **0.0118**.

In other words, only around **4,000** *Sobol* scenarios suffice to achieve a quality similar to that of **1,000,000** *Pseudo Random* scenarios!

The following chart plots the two sets of estimated *standard errors* against the sample size **N**, side by side:

Due to *Sobol's* support for multithreading the convergence properties of *Sobol* simulations appear more pronounced when the processing time is plotted against the achieved *standard error*:

The Two-Dimensional Case

It is interesting to run the same comparison tests in the case of a financial product, the payoff of which relies on two random factors.

Then a *Monte Carlo* simulation with **N** scenarios will produce **N** pairs of random numbers between **0** and **1**, which will lead to **N** values of the two random factors at expiry.

As a concrete two-factor instrument, I will choose the *Trigger Plus* note covered in my previous article, so that the current analysis will also serve as an enhancement to my earlier analysis of that product.

The diagram below explains the note's payoff at maturity:

The image below shows the three spreadsheet formulas – the red cells from bottom to top – that **a)** create the market data object, **b)** create the instrument object and **c)** compute the exact, analytically derived price of **996.953**.

I want to see how well the two *Monte Carlo* methods are going to converge to that price.

First, I must recast the instrument created in cell **B7** as an object of type *Structured Note*, in order to be able to price it using Monte Carlo.

This has been already done in my previous post and the result is shown below.

The difficult past was the construction of the object in cell **K2** representing the payoff function. Then the *Structured Note* object is easily constructed in cell **K14**.

The remaining steps are straightforward and are the same as those in the one-dimensional case.

I simply use the wizard to paste the pricing formula incorporating the *Pseudo Random Monte Carlo* model and then I set up a table with rows corresponding to different sample sizes and one column where the pricing formula is repeated referencing these sample sizes, as shown below:

Then I produce a similar table with the only difference being the replacement of the *Pseudo Random Monte Carlo* model with the *Sobol Monte Carlo* model:

As in the one-dimensional case, application of the Randomized Sobol simulation technique

Since the note's notional is **1,000**, it is more appropriate to regard a price of **0.1** as representing a basis point in this case.

The interesting observation on the above table is that the standard error falls below **1 bp** (**0.0657**) when **N** = **16,383**, shown above at the highlighted row.

The equivalent sample size in the one-dim case was only **4,095**.

Why is a bigger sample size required in the two-dim case in order to reach **1 bp** accuracy?

The reason relates to the fact that a two-dimensional space contains more volume and therefore requires a greater number of sample points to be satisfactorily filled.

But correlation plays an important role too.

Changing the current correlation from **46%** to **0%** changes the sample size threshold back to the one-dimensional case of **4,095**.

This is achieved because the *Sobol* points are so constructed that their projections along all axes are exactly the same numbers as those produced by a pure one-dimensional *Sobol* simulation.

This means that the **N** Sobol points, although they are distributed over a much bigger two-dimensional area, their projections on both axes carry no loss over a one-dimensional simulation with also **N** points.

If the two axes are uncorrelated, it follows that both random variables (along the two axes) are represented with the same fidelity as when they are separately simulated on a stand-alone one-dimensional basis.

When the correlation moves away from **0**, increasingly more *Sobol* points are mapped through the usual inverse diagonal method to final correlated values that are close to each other. Such points thus partially replicate each other, with the consequence being that a bigger sample size is required to sufficiently represent the likely combinations of the two random variables.

It is obvious that such *Sobol* points correspond to events that carry a very low probability and therefore are wasted, in the sense that their existence brings little weight to the computation of the average.

The conclusion is that higher absolute values of correlation affect negatively the simulation, in the sense that a bigger sample size is required to achieve a certain *standard error* threshold.

Below are the three charts presented above for the one-dimensional case, as they apply in the current two-dimensional problem:

Click on **RandomizedSobol.xlsx** to download the spreadsheet produced with the above steps.

Feel free to contact me if you want to share any thoughts with regard to this product or if you want to request any particular features. Contact info and social media links are available at my web site https://www.deriscope.com