Talk:Importance sampling

From Wikipedia, the free encyclopedia

On January 21st, 2006, this article was expanded by Izar(199.111.224.202; I was logged out during editing). I am a 4th year undergraduate student in electrical engineering major, and my concentraion is communications and signal processing. As an undergraduate researcher, I am interested in information theory and coding theory.

Before I expand this article, it was a mathematics stub, so I added some sections. However, it still can be expanded in a lot of aspects. For example, for the biasing methods, there are many other kinds of them such as 'exponential twisting', so I think those can be explained briefly or with some details. Or, some applications using this importance sampling technique may be discussed in a different section. Izar 05:06, 21 January 2006 (UTC)


[edit] Introduction

I don't see why importance sampling is a variance reduction technique in MC estimation. It a technique for estimating E[X|A] when all you have is E[X|B]. If I remember correctly, it is the 'weighted importance sampling' techniques that have reduced variance compared to the standard 'importance sampling' technique, at the expense of becoming biased. --Olethros 16:15, 9 February 2006 (UTC)

[edit] Mathematical approach

Why is this talking about the binomial distribution and event (X>t)? Just talking about the expectation of a general random variable would have been much simpler and much more general. The X>t treatment is confusing and overlong.--Olethros 16:15, 9 February 2006 (UTC)

I totally agree! The edits by 199.111.224.202 [1] made the article very difficult to understand. Someone should really perform some cleanup here. --Fredrik Orderud 17:51, 9 February 2006 (UTC)

[edit] Rewrite

I my opinion the article here is fundamentally flawed. If X is the outcome of some experiment, I don't have f(X) or F(x), so I can't possibly calculate W(x). I only have f(x) for the input parameters of the simulation. So there has to be a distinction being made between the distributions of the input parameters and the distribution of the simulation result. Furthermore, as was already pointed out before, the restriction to E[X>t] is unnecessary, and that there is some binomial distribution for this event is true, but completely off topic. What I would propose for a rewrite is along the following lines:

Let S(Z) be some simulation depending on some input parameters Z where Z itself is a random variable with some known (and favorably analytical) distribution fZ(z). The problem is to determine some estimate for the expected value of a function of the solution e.g. E[φ(S)], where e.g. φ could be something like \phi(S)=[\alpha\leq S] (which would correspond to the current version of the article i.e. E[S > t]). Now we have

E[\phi(S)]=\int \phi(s) dF_S(s)=\int \phi(s) f_S(s) ds
=\int \phi(S(z)) f_Z(z) dz=E[\phi(S(Z))]
\approx \frac{1}{N}\sum_{i=1}^N  \phi(S(z_i))

where the zi are i.i.d. according to fZ.

Now we rewrite this to

E[\phi(S)]=\int \phi(S(z^*)) \frac{f_Z(z^*)}{f_{Z^*}(z^*)} f_{Z^*}(z^*)dz^*=E[\phi(S(Z^*))W(Z^*)]
\approx \frac{1}{N} \sum_{i=1}^N  \phi(S(z^*_i)) W(z^*_i)

with W(Z^*)=\frac{f_Z(z^*)}{f_{Z^*}(z^*)} and the z_i^* i.i.d. according to f_{Z^*}.

Example: Let S(Z) = Z2 be the simulation outcome (ok, a bit simple for a simulation, but its just an example, right?), the distribution of input values uniform on [0,1] (i.e. Z˜U[0,1] and the function of interest \phi_\alpha(S)=\Chi_{[\alpha,\infty]}(S). Then the normal estimator for E[φ(S)] would be

\hat E[\phi_\alpha(S)]= \frac{1}{N}\sum_{i=1}^N  [z_i^2\geq \alpha]

Where the ni are U[0,1] distributed and [b] is 1 if b is true and 0 otherwise (see Knuth: A short note on notation). Taking for Z * a normal distribution N(μ,σ2) (with \mu=(1+\sqrt{\alpha}))/2 and \sigma=0.3*\sqrt{1-\sqrt{\alpha}} giving quite good results) we get

\hat E[\phi_\alpha(S)]= \frac{1}{N}\sum_{i=1}^N  [n_i^2\geq \alpha] \frac{[0\leq n_i \leq 1]}{\frac1{\sqrt{2\pi}\sigma}e^{-(n_i-\mu)^2/2\sigma^2}}

and the ni are N(μ,σ) distributed.

Some Matlab code illustrating the example (should run with Octave too)

 N=10000;
 m=100;
 alpha=0.7;
 phi1=[];
 phi2=[];
 for i=1:m
     u_i=rand(N,1);
     phi1=[phi1 1/N*sum(u_i.^2>=alpha)];

    mu=(1+sqrt(alpha))/2;
     sig=0.3*sqrt((1-sqrt(alpha)));
     n_i=sig*randn(N,1)+mu;
     phi2=[phi2 1/N*sum( (n_i.^2>=alpha) .* (0<=n_i) .* (n_i<=1) .* (sqrt(2*pi)*sig*exp( (((n_i-mu)/sig).^2)/2) ) )];
 end
 fprintf( '\n\n\n------------\n' );
 fprintf( '%10.6f\n', 1-sqrt(alpha) );
 fprintf( '%10.6f+-%g\n', mean(phi1), std(phi1) );
 fprintf( '%10.6f+-%g\n', mean(phi2), std(phi2) );

Prints:

 0.163340
 0.163480+-0.00397507
 0.163354+-0.0015864

134.169.77.186 14:46, 8 March 2007 (UTC) (ezander)