Combining Randomized Quasi Monte Carlo (Sobol) and Parallel Processing (Multithreading) when Pricing Derivatives in Excel
The beauty of Monte Carlo simulation is that it can be used to price any European financial derivative contract, of which the terminal payoff is expressed as a function of D terminal underlying factors by simulating the terminal values of these factors as of the contract's maturity date.
It turns out, the successful execution of a Monte Carlo simulation requires the prior specification of two elements:
- 1.The payoff as a function f that maps D real numbers to one real number, i.e. a function f: RD -> R.
- 2.The rules that govern the generation of the simulated scenarios (values) of the D terminal underlying factors. These rules include the specification of the number N of scenarios, the method by which the random numbers are generated, but also the number of parallel threads on which the simulation runs in a multithreading environment.
The Excel Deriscope add-in contains spreadsheet formulas that produce the above two elements in the form of memory-lived persistent objects.
These objects are accessed by the user through corresponding spreadsheet-displayed text labels that serve as the objects' unique handles.
While this object-oriented approach is nothing new, Deriscope's integrated wizard is indeed novel, because it is capable of dynamically generating all spreadsheet formulas that are needed in any given context. Without the wizard, this job would require an enormous effort on the part of the user.
A detailed example of how one may use the wizard to build a spreadsheet that prices the Trigger Plus note issued by Morgan Stanley on April 1, 2019 can be found in my previous article.
There I applied two variants of the Monte Carlo technique that differ from each other on how the underlying "random" numbers are generated. In both cases the generated random numbers – also more properly referred as pseudo random numbers from the Greek word "ψεύδος" meaning "lie" - are deterministically produced by a certain algorithm so that they all lie in the interval (0,1).
Recommended for Deriscope starters: The Overview and Quick Guide pages.
Pseudo and Quasi Random Number Generators
- 1.Pseudo Random Number Generator: The collection of the generated "random" numbers closely resembles the dots produced by blindfold throwing a dart on a linear segment of unit length.
- QuantLib uses several different generators, but primarily the Mersenne Twister algorithm developed in 1997 by Makoto Matsumoto and Takuji Nishimura with the very long period of 219937-1, meaning that the current state of the generator is stored in an array of 19,937 bits.
- The main disadvantage of this generator is the lack of an easily implementable "jump ahead" method that would allow the simultaneous creation of independent random number subsequences, which are necessary for running a simulation on parallel threads. Due to this deficiency, Deriscope runs simulations based on Pseudo Random numbers always on a single thread.
- 2.Quasi Random Number Generator: The collection of the generated "random" numbers closely resembles the positions of particles constrained on a linear segment of unit length that are all positively charged and therefore repel each other.
- QuantLib uses primarily the Sobol implementation, named after its invention by the Russian mathematician Ilya Meyerovich Sobol in 1967.
- Multithreading is supported by the existence of a "jump ahead" method.
- Its main disadvantage is that the produced numbers are not conformant with what one would expect from a random variable, a fact that makes the direct computation of an estimated standard deviation impossible.
- Deriscope's workaround is based on the technique of randomization that consists of cloning a given grid of Sobol points to another independent one by applying a random digital shift on the involved integers. This special shift is implemented by XORing the Sobol integers with a Mersenne Twister random integer.
It is well known that for the same number of generated "random" numbers, the Quasi Monte Carlo technique – also known as Low Discrepancy Monte Carlo - leads to a better estimate of the true – but unknown – average of the financial quantity that depends on these numbers than the Pseudo Random Monte Carlo.
The inferiority of the Pseudo Random Monte Carlo technique is due to the non-uniform coverage of the unit interval with "random" numbers (more details below). Therefore, heavily congested segments of the unit interval contribute unfairly more to the average than other relatively vacant areas.
This is also certified by the numerical results in the article mentioned, where the Quasi Monte Carlo pricing converged much faster to the known exact result than the Pseudo Random Monte Carlo, as the following chart betrays:
It is anyway a mathematical fact that the Pseudo Random Monte Carlo convergence rate is of the order of O(N-½) versus the much faster Quasi Monte Carlo convergence rate of O((logN) ½/N) - at least under certain conditions -, where N is the number of scenarios produced by the simulation.
In layman terms, the above rates mean that the Pseudo Random Monte Carlo standard error is inversely proportional to √N, whereas the Quasi Monte Carlo standard error is almost – ignoring the slower (logN) ½ term - inversely proportional to N.
In even more layman terms, multiplying the simulation's number of scenarios by a factor of 100, for example changing an initial N = 100,000 (quite standard in certain actually used contexts) to N΄ = 10,000,000 (often considered very big and impractical), will result to only a 10 times smaller standard error under Pseudo Random Monte Carlo, but to an almost 100 times smaller (!!!) standard error under Quasi Monte Carlo!
What was not addressed in the previous article, was the practical estimation of the standard error associated with the Quasi Monte Carlo method.
While the standard error in the Pseudo Random Monte Carlo case can be readily estimated by calculating the average of the squared differences of the simulated values from their average, no similar methodology can be applied when the simulated values are the result of Quasi Monte Carlo. This is due to the fact that the "random" numbers generated in the course of a Quasi Monte Carlo simulation cannot be regarded as the results of i.i.d. (independent identically distributed) trials and therefore the whole statistical apparatus supporting the derivation of the standard error formula breaks down.
The purpose of this document is to show how one may still compute in Excel a standard error associated with a Quasi Monte Carlo simulation by using the well-known Randomized Quasi Monte Carlo technique.
It also demonstrates the incorporation of parallel processing, which brings the advantage of a considerable computation speed-up in a multithread environment, which is by the way standard nowadays.
But first, let us better understand the two random number generators by studying their produced "random" numbers.
Visualizing Pseudo Random and Sobol Number Sequences in One Dimension
One may think of using the built-in Excel RAND formula to display a series of random numbers between 0 and 1 in the spreadsheet.
Unfortunately, this would be a likely poor choice because the RAND formula evokes the Mersenne Twister random number generator with a clock-based seed. The seed is just an integer that affects the initial values of the Mersenne Twister parameters, which in turn generate the final series of random numbers.
It follows that two nearby cells containing the formula =RAND() will evoke two separate Mersenne Twister random number generators initialized with different seeds that will accordingly generate different sequences of random numbers. Each RAND formula outputs the first number of the respective sequence. Therefore, the two displayed numbers in the nearby cells are different, not because they are two separate elements of the same sequence of random numbers, but because they are the first elements of two unrelated sequences. So, these two numbers look random, but are not truly related to each other as two random numbers ought to!
It is likely that the newly introduced RANDARRAY formula returns a sequence of truly random numbers, but its seed is still clock-based, which makes replication of the results a bit impractical.
Another built-in alternative is to make use of the vba RND function that takes a seed input and returns a proper sequence.
Below, I am going to access the Mersenne Twister Random Number generator through a Deriscope formula, that allows specifying the seed and therefore produces the exact same sequence of numbers every time it runs.
I will depict the numbers as dots on a straight line of unit length and I want to see how these dots get allocated as their total number increases from 1 to 15.
Then I will do the same with numbers generated from a Sobol Quasi Random Number generator and compare the two sets of numbers.
The construction takes two steps.
The first step is the creation of an object of type Model<Simulation> and the next step is the invocation of that object's local function Random Generator Values.
The following short video shows the mouse clicks needed in the first step:
Below the created spreadsheet formula is shown, where I have added for readability purposes the two extra key-value pairs Handle= Random1D and Random Generator= Pseudo Random
The second step is to create the spreadsheet formula that outputs the random numbers generated by the above object.
The next video shows how I select the appropriate function from the Function Selector of the Deriscope wizard and have it pasted in the spreadsheet by clicking on the Go button:
As shown below, the wizard has pasted the array formula {=ds(B20:C22)} in the output range B10:B19, which consists of 10 rows because of the key-value pair Scenarios=10 that dictates the generation of a simulated sample consisting of 10 scenarios.
Each sample element, i.e. each scenario corresponds to one random number in the interval (0,1) because the dimensionality has been set to 1 by the key-value pair Dim=1. This is the reason why the output range consists of one column only.
I will examine higher dimensions afterwards.
Next, I want to see what values are generated by a Quasi Random Generator of the Sobol type.
While I could repeat the previous steps with the wizard, it is much more sensible to exploit Excel's unparallel capacity for manipulating formulas and data by simply copying and pasting everything shown to its immediate right, as this video shows:
Below is the state of my spreadsheet after the copy and paste operation and after I have manually entered the shown values in cells F5 and F6:
The difference between the two sequences of "random" numbers is striking!
The Sobol numbers seem to follow some ordering principle, which I will explain shortly.
The easiest way to understand the structure of the Sobol numbers is by plotting them as a function of the number of scenarios, i.e. the sample size.
It is very simple to set up the Deriscope spreadsheet formulas that are needed so that their output to be directly presentable as an Excel chart.
The produced charts for both the Random and Sobol cases are shown below.
You may find the source formulas as part of the downloadable spreadsheet at the end of this article.
The last chart explains the very simple algorithm that creates the Sobol numbers.
The idea is to bisect any existing empty intervals (i.e. intervals that contain no dots) by placing a new dot exactly at their middle point.
Initially, when no dots exist, there is only one empty interval, the (0,1).
Then the following steps are undertaken:
Step 1: Generating the first Sobol number 0.5:
The first Sobol number is chosen at exactly the middle of the initial empty interval (0,1), which is the number 0.5, shown as a red dot in the middle of the green line at the bottom.
Step 2: Generating the second Sobol number 0.75:
The insertion of the previous dot creates two new equal size empty intervals, the (0, ½) and (½,1).
According to the bisection rule, these two empty intervals must be bisected by placing new dots at their middle.
First the right empty interval (½,1) is bisected by placing a new dot at its middle at 0.75.
Step 3: Generating the third Sobol number 0.25:
Then the left empty interval (0, ½) is bisected by placing a new dot at its middle at 0.25.
Steps 4-7: Generating the 4th to 7th Sobol numbers:
By the end of Step 3, a total of 4 empty intervals exist.
Therefore, the next bisection round must halve all these 4 intervals by inserting 4 dots at their middle.
This takes 4 steps, by the end of which, the following new numbers have been added:
0.125
0.375
0.625
0.875
Important:
A sample consisting of 7 Sobol numbers is special in the sense that all these numbers are distributed uniformly across the unit interval.
As you see at the 7th green line from the bottom, the 7 red dots are separated from each other by the same distance of 0.125.
Steps 8-15: Generating the 8th to 15th Sobol numbers:
By the end of Step 7, a total of 8 empty intervals exist.
Therefore, the next bisection round must halve all these 8 intervals by inserting 8 dots at their middle.
This takes 8 steps, by the end of which, the following new numbers have been added:
0.0625
0.1875
0.3125
0.4375
0.5625
0.6875
0.8125
0.9375
Important:
A sample consisting of 15 Sobol numbers is special in the sense that all these numbers are distributed uniformly across the unit interval.
As you see at the top green line, the 15 red dots are separated from each other by the same distance of 0.0625.
The bisection rule is closely related with the usual implementation of the Sobol algorithm in terms of bitwise operations on the bit arrays that correspond to the base 2 representations of unsigned integers.
Note there also exist other Low Discrepancy algorithms that work with a base other than 2.
The important aspect of the Sobol algorithm is that the dots cover symmetrically the unit interval only when their number is one of 1, 3, 7, 15, … or 2n-1, where n is any positive integer.
This means that the sample size in the Sobol case ought to equal 2n-1, in order to obtain the most fairly distributed set of numbers in (0,1), in the sense that the chosen numbers form an equidistant grid with a constant gap of size 1/2n between any two consecutive points.
Note that the Pseudo Random Generator imposes no such restriction on the sample size.
By default, Deriscope issues a warning if one attempts to run a Sobol simulation with a sample size N that does not observe the above rule.
The power of Sobol over Pseudo Random is apparent by the following chart where the dots produced by the two Random Number Generators are placed side-by-side for a sample size of 127.
The dots in the Sobol case are much more uniformly distributed over (0,1) and therefore lead to a much fairer representation of that interval's numbers.
As mentioned above, they form an equidistant grid with a constant gap of 1/128.
So, 127 is the optimal sample size among numbers in the closed interval [64,254].
If we had used a lesser sample size, such as 126, then the minimum gap between consecutive points would still be 1/128, but there would also be occurrences of gaps wider than that.
Increasing the sample size by 1, i.e. setting it equal to 128, introduces a new smaller gap between two of the points, the width of which is half the 1/128, i.e. a gap of 1/256. This smaller gap will apply to all points only when the sample size becomes much higher and equal to 255.
Visualizing Pseudo Random and Sobol Number Sequences in Two Dimensions
Generating samples containing two dimensional vectors of random numbers is required when one needs to simulate the evolution of two – perhaps correlated – factors.
This need arises, for example, in the Monte Carlo pricing of rainbow options, where the option's payoff depends on the terminal prices of two underlying stocks.
In order to see the respective random arrays in the spreadsheet, all I have to do is change the value for the key Dim from 1 to 2, as shown below:
As you see, the array spreadsheet formula {=ds(E20:F22)} now returns a two-column range of numbers. The two numbers in each row represent the coordinates of the corresponding generated Sobol point in a two-dimensional unit square.
Similar holds for the array spreadsheet formula on the left with regard to the Pseudo Random Generator.
As I did in the one-dimensional case, the easiest way to understand how the Sobol pairs evolve is by displaying the 2D charts that correspond to the various sample sizes. Below you see the produced charts. I have not plotted the Pseudo Random case, since it poses no challenges.
The important – and quite surprising – lesson from these charts is that the two-dimensional Sobol dots follow the same periodicity as the one-dimensional, in the following sense:
A similar rule regarding the bisection of empty intervals in the two coordinate axes seems to apply, but not all coordinate combinations are considered when it comes to form the geometrical points.
For example, going from Sample Size = 1 to Sample Size = 2 and 3, the two new points possess horizontal and vertical coordinates that involve the numbers 0.25 and 0.75, which are the middle points of the two intervals (0, ½) and (½,1) on both axes.
But only the two points along the downward sloping diagonal line with coordinates (0.75, 0.25) and (0.25,0.75) are considered.
Interestingly enough, the remaining two points along the upward sloping diagonal line with coordinates (0.75, 0.75) and (0.25,0.25) are left out!
This procedure does not treat the two axes in a symmetrical fashion, but it has the advantage that each new round of bisection – and therefore finer spacing of dots – follows the same pattern as that in the one-dimensional case, i.e. occurs when the sample size exceeds the number of the form 1, 3, 7, 15, … or 2n-1, where n is any positive integer.
This fact turns out to be very important in higher dimensions.
For example, in a space of D dimensions, where D is an even number greater than 6, the first point will be at the center of the D-dimensional unit hypercube having the D coordinates (0.5, 0.5, …, 0.5).
The second point turns out to be at the middle of one of the diagonal lines with coordinates (0.75, 0.25, 0.75, 0.25, …, 0.75, 0.25) that alternate between 0.75 and 0.25.
The third point will be the reflection of the second point through the center of the hypercube, located at (0.25, 0.75, 0.25, 0.75, …, 0.25, 0.75).
Just like in the two-dimensional case, no other points will be created using a different sequence consisting only of 0.25 and 0.75.
This is a good thing!
Imagine instead that a Sobol procedure treated all D axes the same.
Then all possible sequences would have a right to exist, which would also make sense from a pure geometric view in terms of dispersing the points in the most homogeneous fashion.
The number of all possible sequences with D elements being either 0.25 or 0.75 equals 2D or – approximately – 10D/3!
The above number is already huge when D is as low as 18. Then the first 1018/3 or 106 or 1,000,000 Sobol points would all fall on the surface of a sphere centered at (0.5, 0.5, …, 0.5) and having a radius of 0.25.
A simulation based on such Sobol points would hardly give any meaningful results, since all the simulated factors would only assume the two possible numbers corresponding to the Sobol quasi random coordinates 0.25 and 0.75.
The two charts below show the dots produced by the two Random Number Generators for a sample size of 127.
Like in the one-dimensional case, the dots in the Sobol case are much more uniformly distributed over the unit square.
An interesting observation regarding the Sobol grid of points is the following:
No two points are on the same vertical or horizontal line.
Therefore, the x coordinates of all the 127 points are distinct numbers and turn out to be the same with the 127 numbers we encountered in the 1D case.
The same applies for the y coordinates, from which follows that the 127 x coordinates are exactly the same with the 127 y coordinates!
This is hard to infer from a casual look at the below image, but it is a fact that applies to any sample size and any dimension!
Even if you built 127 Sobol points in 1,000 dimensions, you would only need 127 distinct numbers. The 127 coordinates of all these points on any of of the 1,000 axes will form the same constant set of numbers, regardless of the chosen axis.
Visualizing Pseudo Random and Sobol Number Sequences in Three Dimensions
Getting from 2 to 3 dimensions is achieved by changing the value for the key Dim from 1 to 2, as shown below.
The recent Excel support for "dynamic arrays" that are formulas entered in one cell but "spill" their output automatically over the so called "spill range" leads to the generation of 3-dim Sobol numbers without the need of cumbersome manual adjustments of the involved array formulas.
The array spreadsheet formula { =ds(F20:G22)} now returns a three-column range of numbers. The three numbers in each row represent the coordinates of the corresponding generated Sobol point in the 3-dimensional unit cube.
The array formula on the left does the same with regard to the Pseudo Random Generator.
It is still possible to visualize how the Sobol pairs evolve in 3-dimensional space by displaying the 3D charts that correspond to the various sample sizes. Below you see the produced charts as part of a video for better visual resolution. You may pause the video to study any specific frame.
The analysis presented before in the 2-dimensional case is carried here unchanged.
Even though the dimensionality now is higher, the optimal sample sizes are still 1, 3, 7, 15, …, 2n-1.
Below is the plot of the first 127 3D Pseudo Random points in a 3D cube of unit size:
Compare the above with the plot of the first 127 3D Sobol points shown below:
Pricing a Stock Option using One-Dimensional Pseudo Random Simulation
We can study the one-dimensional Pseudo Random Simulation by pricing a plain vanilla European call option on a stock using Monte Carlo, because then the exact price is known from the Black Scholes formula.
The latter – in the case of a European call - is a function ƒ of 6 variables:
Call Price = ƒ(S, K, σ, r, δ, t) = e⁻δt S N(d₁) - e⁻rt KN(d₂)
where
S = spot stock price
K = agreed strike
σ = assumed (implied) volatility of the stock price
r = continuously compounded annualized interest rate for the time t
δ = expected continuously compounded annualized dividend yield until the time t
t = time to the option's expiry in annual units
d₁ = ln(e(r⁻δ)t S/K)/(σt½) + ½σt½
d₂ = d₁ - σt½
While one may use built-in Excel functions to calculate ƒ in the spreadsheet, it is easier to use the wizard to paste the formula that creates an object of type BS Fn, which encapsulates the function ƒ. This short video shows how I do just that:
Here is how the created Black Scholes function object (object of type BS Fn) looks like on my spreadsheet:
Even though the formula =ds(B3:C4) in cell B2 takes no input other than the Type and Function keys, the contents of the created object displayed in the wizard indicate that Payoff Type= Vanilla and Direction= Call.
This is so because these two key-value pairs are optional and assume the shown default values when they are not part of the input.
The two keys shown at the bottom, _Domain Dim= 6 and _Range Dim= 1, indicate what has already been mentioned, namely that the Black Scholes function ƒ is a function of 6 variables, i.e. it maps R6 to R.
I can now evaluate the Black Scholes function for a given set of its input variables by calling its local function Evaluate. As usually, I let the wizard paste the formula for me:
The result of 7.0829 is shown below. Note that the shown numbers 95, 100, 0.2, 0.04, 0, 1 are parsed using the fixed order S, K, σ, r, δ, t.
My next task is to calculate the price of the same European call option using Monte Carlo, so that I can see if and how the result is going to converge to 7.0829 as I increase the sample size (number of scenarios).
Before I can price the mentioned option using Monte Carlo, I first need to create the corresponding Deriscope object of type Structured Note.
Since it doesn't matter what the option's underlying is, I will an underlying of type Stock. The following video shows how I do that using the wizard.
Below you see that the wizard has pasted two formulas in the spreadsheet.
The first formula is =ds(J11:K14) in cell J10 that creates the object &Linear1D.1 of type Linear Fn.
This object is then used as input to the main formula =ds(J3:K8) in cell J2 that creates my final object &Tradable1D.1 of type Structured Note.
The wizard has chosen an object of type Linear Fn by default to represent the required payoff function expected by the key Payoff Fn in cell J7 because this type is one of the simplest types that represent payoff functions.
Due to its coefficient of 1 in cell K13 the created object represents the identity function that maps the variable x to itself.
In other words, this structured note is designed to pay x, if the price of the stock underlying %RDSB.L|GB on the maturity date of 07 Aug 2020 happens to be x.
If I had tried to price this note using Monte Carlo, the result would converge to the spot price of 95, provided that zero dividends were assumed.
But I am after the pricing of a European call option, the payoff function of which is more interesting than the current identity payoff function, in the sense that the option's price is also sensitive to the assumed values for the Eval Date in cell K8 and the discounting interest rate.
I want therefore to replace the object &Linear1D:1.1 with another one that represents the function ƒ(x) = max(x-K,0), which is the payoff function of the European call at expiry, where x is the realized stock price at expiry.
Omitting the video with the intermediate wizard steps, the final generated formulas look so:
Selected cell above is the cell J10 that contains the formula =ds(J11:K14) and returns the object &CallFn.1 of type Payoff Fn, of which the contents are visible in the taskpane on the right.
This object represents a function that maps one real variable x to another real variable according to the formula associated with the Payoff object supplied in cell K14.
The latter is created below in cell J16 and has obviously the desired payoff function x -> max(x-K,0), with K being the Strike, which is set to 1 in cell K22.
In order to make my structured note identical with a European call option with strike 100 and expiring in one year from today, I should manually change the Eval Date in cell K3 from 07 Aug 2020 to 06 Aug 2020, so that I get exactly 365 days when counted from today (07 Aug 2019).
I should also change the strike in cell K22 from 1 to 100.
After these changes, I ask the wizard to paste the pricing formulas, as shown in the short video below:
Below you see the main formula pasted by the wizard in cell M2.
It could have returned the price as a number, but instead it returns the object &Variant_M2:1.1.
This is due to my choice Output= Full in cell N7.
The advantage of this choice is that the result of the Monte Carlo calculation includes a few more data of interest, which are visible in the taskpane on the right.
The final price is the Value= 7.529.
Very important is also the Std Error = 2.253, which is huge due to the fact that the default sample size chosen by the wizard in cell N18 is only 10.
The objects containing the market data are located in lower rows, as seen below.
With green color are the values that I have already linked to the corresponding cells used in the evaluation of the Black Scholes formula.
If I click on the lens sign next to the Sim Report, I see a few more details about the simulation, as shown below:
This tells me, for example, that the non-discounted average of the simulated terminal structured note values is 7.836, their standard deviation is 7.415, the estimated standard deviation of their average is 2.345 and that the simulation ran in a non-parallel fashion thus utilizing only one thread.
The entries Randomized Runs and Randomized Averages apply only when the simulation is of the Sobol type.
The cell next to Simulated Values appears empty because I have not explicitly requested that the terminal stock prices and their respective structured note payoff values are collected and reported across all scenarios.
I can set up such a request as follows:
Currently the specs for my simulation are defined by the object &SimMdl_M14:1.1 created in cell M14, as shown below.
Since the cell M14 is selected, the wizard displays the contents of the respective object.
The dimmed keys correspond to optional keys that retain the shown value when they are not explicitly supplied on the spreadsheet.
The one at the bottom is called Simulated Values and currently has the boolean FALSE value, as indicated by the unchecked box on its right.
All I need to do, is manually insert the key-value pair Simulated Values= TRUE in the spreadsheet, within the input range of the formula that creates the &SimMdl_M14:1.1 object.
If I do that and go back to my previous screen for the Sim Report, I will see the following:
I may now click on the lens sign to get the following screen:
The second column contains the terminal stock prices x at expiry as they are produced by the respective 10 scenarios during the course of the simulation.
The third column contains the corresponding evaluated payoff function max(x-100,0).
The price of the structured note is simply the average of the shown prices on the right column, multiplied by the discount factor as of the expiry date.
For efficiency reason, it is better to have Simulated Values= FALSE when the number of scenarios is very big.
There are two more adjustments I need to make, in order for the Monte Carlo price to converge to the exact Black Scholes result as the number of scenarios increases.
The first is inserting the optional key-value pair Vol Time DC= %ACT/365F in the input range of the formula that creates the &SimMdl_M14:1.1 object in cell M14.
The second is inserting the optional key-value pair TS Daycount= %ACT/365F in the input range of the formula that creates the yield curve object in cell M39.
These adjustments are needed so that the time interval in annual units between today (07 Aug 2019) and expiry (06 Aug 2020) is evaluated as being exactly equal to 1, since this is the time interval used in the Black Scholes formula that calculates the exact price.
Now I am ready to set up my final table, which should report how well the simulation results approach the known exact Black Scholes price of 7.0829.
Since I want to also compare the results against those from a Sobol simulation, I am going to set up the sample sizes so that they follow the sequence 1, 3, 7, 15, …, 2n-1, as has already been explained.
Accordingly, the left two columns will look like below, where the sample size N goes from 1 to 1,048,575 = 220-1:
Although straightforward in principle, the remaining steps could be complex, due to the fact that each sample size N should be first used to create a corresponding object of type Model[Simulation]. Then the latter object should be used to create a corresponding object of type Model[Structured Note]. Finally, the pricing formula for each N should be called in a way that it references this last object.
All these intermediate objects can be easily created by using the special Deriscope Clone function, which replicates any given object while simultaneously changing one or more of the data this object hold.
But in the current case, I can make use of the key-value pair Edit Input= TRUE that may be optionally supplied to the Price function.
This allows me to add the following 3 key-value pairs:
- 1.Key Ref Objects that expects one or more objects of any type that are involved in the pricing.
- 2.Key Ref Keys that expects the keys of the objects specified above, of which the associated values I wish to modify.
- 3.Key Ref Values that expects the new values that should replace the currently existing values that are associated with the keys specified above.
In effect, I can compute in a single step the price that results from a change anywhere in the input data!
The image below shows the application of this technique in calling the Price function with a modified value of 100 for the key Scenarios of the reference object &SimMdl_M14:1.1:
All I need, is enter formulas in the cells of column R that reference the range shown above, except the bottom-right cell, which must be replaced with a link to the corresponding cell of column Q.
So, the formula in the top cell R3 should look as follows:
This formula may now be copied all the way down.
The remaining 3 columns are supposed to display the Value, Standard Error and Processing Time that are contained in the respective result objects of column R.
The easiest way is to use the wizard to paste the first formula in cell S3, as this short video shows:
Then the formula may be pasted downwards and to the right to cover the whole range to get the following final result:
Pricing a Stock Option using One-Dimensional Sobol Simulation
Next, I want to build a similar table using a Quasi Random Number Generator of the Sobol type.
As I have already mentioned, the Sobol algorithm supports multithreading, which I will obviously – and ruthlessly - exploit, since my pc has 8 logical processors.
My first step consists of creating a new object of type Model[Simulation] that involves the Sobol generator and the multithreading requirement.
Below you see that object created in cell W13, with the new key-value pairs Random Generator= Low Discrepancy and Exec Policy= &Parallel.1.
Without further ado, following similar steps as before, the table with the price dependence on sample size under Sobol simulation is shown below:
Comparing the Pseudo Random and Sobol Simulation Results in One Dimension
A first observation is that the processing times for the Sobol Monte Carlo are much shorter than those for the Pseudo Random Monte Carlo, due to parallel processing supported by the former but not by the latter.
The second observation is that the Sobol-generated prices seem to converge much faster to the exact price of 7.0829.
The following chart plots the two sets of prices against the sample size N, side by side:
This chart is effectively in logarithmic scale on the x axis, since the depicted variable is the logarithm base 2 of N+1.
Regarding a price of 0.01 as one basis point (bp), we observe that under Pseudo Random Monte Carlo we need well over 1,000,000 scenarios to reach a standard error that approaches 1bp.
Unfortunately, no standard error can be computed under Sobol Monte Carlo.
Nevertheless, the above table indicates that even the price 7.0713 reached with a sample size of 4,095 is almost within 1bp from the known exact price of 7.0829.
One may thus wonder if the Sobol sample size of 4,095 is sort of equivalent to a Pseudo Random sample size of 1,048,575.
In fact, it turns out that it is even better!
It is possible to estimate the standard error associated with any given Sobol sample size N by running a so called Randomized Sobol Monte Carlo.
This consists of running R separate Sobol sub-simulations, each of sample size N, whereby the generated Sobol numbers between 0 and 1 in each sub-simulation are "shifted" in a special fashion by some random amount.
Each such sub-simulation leads to a corresponding price for our tradable.
The end effect is that the collection of these R prices can be regarded as a sample retrieved through R i.i.d. trials of some unknown random variable.
This assumption makes it legible for us to use the usual formulas for the sample-implied estimated standard deviation of that unknown random variable.
A typical choice for R is a number between 50 and 100.
Since the R sub-simulations are distributed in 8 parallel threads – in my pc at least – it makes sense to set R to a multiple of 8 so that each thread assumes the same load.
I will thus use R = 64.
In the image below, I use the Clone function to modify the existing &Sob1D.1 object by setting new values for its keys Scenarios and Randomized Runs.
Proceeding like before, I use the above object to set up the table shown below, where the last column contains the standard deviation associated with the 64 tradable prices generated by each Randomized multi-simulation against the sample size N.
The Total N column contains the total number of scenarios generated by each multi-simulation.
The result supports my earlier suspicion that the Sobol N = 4,095 is indeed "equivalent" with – and actually better than – the Pseudo Random N = 1,048,575.
Indeed, the highlighted row above shows that the Randomized Sobol simulation leads to an estimated standard error of 0.0085 for N = 4,095, whereas the corresponding standard error in the Pseudo Random case was found to be much higher at 0.0118.
In other words, only around 4,000 Sobol scenarios suffice to achieve a quality similar to that of 1,000,000 Pseudo Random scenarios!
The following chart plots the two sets of estimated standard errors against the sample size N, side by side:
Due to Sobol's support for multithreading the convergence properties of Sobol simulations appear more pronounced when the processing time is plotted against the achieved standard error:
The Two-Dimensional Case
It is interesting to run the same comparison tests in the case of a financial product, the payoff of which relies on two random factors.
Then a Monte Carlo simulation with N scenarios will produce N pairs of random numbers between 0 and 1, which will lead to N values of the two random factors at expiry.
As a concrete two-factor instrument, I will choose the Trigger Plus note covered in my previous article, so that the current analysis will also serve as an enhancement to my earlier analysis of that product.
The diagram below explains the note's payoff at maturity:
The image below shows the three spreadsheet formulas – the red cells from bottom to top – that a) create the market data object, b) create the instrument object and c) compute the exact, analytically derived price of 996.953.
I want to see how well the two Monte Carlo methods are going to converge to that price.
First, I must recast the instrument created in cell B7 as an object of type Structured Note, in order to be able to price it using Monte Carlo.
This has been already done in my previous post and the result is shown below.
The difficult past was the construction of the object in cell K2 representing the payoff function. Then the Structured Note object is easily constructed in cell K14.
The remaining steps are straightforward and are the same as those in the one-dimensional case.
I simply use the wizard to paste the pricing formula incorporating the Pseudo Random Monte Carlo model and then I set up a table with rows corresponding to different sample sizes and one column where the pricing formula is repeated referencing these sample sizes, as shown below:
Then I produce a similar table with the only difference being the replacement of the Pseudo Random Monte Carlo model with the Sobol Monte Carlo model:
As in the one-dimensional case, application of the Randomized Sobol simulation technique
Since the note's notional is 1,000, it is more appropriate to regard a price of 0.1 as representing a basis point in this case.
The interesting observation on the above table is that the standard error falls below 1 bp (0.0657) when N = 16,383, shown above at the highlighted row.
The equivalent sample size in the one-dim case was only 4,095.
Why is a bigger sample size required in the two-dim case in order to reach 1 bp accuracy?
The reason relates to the fact that a two-dimensional space contains more volume and therefore requires a greater number of sample points to be satisfactorily filled.
But correlation plays an important role too.
Changing the current correlation from 46% to 0% changes the sample size threshold back to the one-dimensional case of 4,095.
This is achieved because the Sobol points are so constructed that their projections along all axes are exactly the same numbers as those produced by a pure one-dimensional Sobol simulation.
This means that the N Sobol points, although they are distributed over a much bigger two-dimensional area, their projections on both axes carry no loss over a one-dimensional simulation with also N points.
If the two axes are uncorrelated, it follows that both random variables (along the two axes) are represented with the same fidelity as when they are separately simulated on a stand-alone one-dimensional basis.
When the correlation moves away from 0, increasingly more Sobol points are mapped through the usual inverse diagonal method to final correlated values that are close to each other. Such points thus partially replicate each other, with the consequence being that a bigger sample size is required to sufficiently represent the likely combinations of the two random variables.
It is obvious that such Sobol points correspond to events that carry a very low probability and therefore are wasted, in the sense that their existence brings little weight to the computation of the average.
The conclusion is that higher absolute values of correlation affect negatively the simulation, in the sense that a bigger sample size is required to achieve a certain standard error threshold.
Below are the three charts presented above for the one-dimensional case, as they apply in the current two-dimensional problem:
Click on RandomizedSobol.xlsx to download the spreadsheet produced with the above steps.
Feel free to contact me if you want to share any thoughts with regard to this product or if you want to request any particular features. Contact info and social media links are available at my web site https://www.deriscope.com