My textbook states the theorem that if a sequence of random variables converges almost surely, it also converges in probability, but that the opposite does not necessarily hold. It also states that if a random variable converges in probability, there is nevertheless a subsequence that converges almost surely.
Annoyingly, the book doesn't give proofs or explanations.
To me it seems that the two concepts are really the same thing, though convergence in probability is defined sequentially on individual elements of the sequence, while almost sure convergence is defined on an entire sequence at once.
Why aren't these definitions equivalent?
$\endgroup$2 Answers
$\begingroup$Convergence in probability : $X_n \xrightarrow{p} X$ if for all $\epsilon > 0$, $\Pr[|X_n - X| > \epsilon] \to 0$ as $n \to \infty$.
Convergence almost surely : $X_n \xrightarrow{a.s} X$ if $\Pr[X_n(w) \nrightarrow X(w)] = 0$.
Convergence almost surely implies convergence in probability. However, the other way is not true, the most popular counterexample being the following.
Let the space in consideration be $[0,1]$. We will define our sequence with double indices, $X_{nk}$, where $0 \leq k < n$ and $n \geq 1$. Reindexing can be done later.
Let $X_{nk} = 1_{[\frac kn, \frac{k+1}{n}]}$, and let $X \equiv 0$. I claim that $X_{nk} \xrightarrow{p} X$. This is because, if we choose any $\epsilon > 0$, then for each $n$ and $k$, $X_{nk}$ is non-zero only in an interval of $\frac 1n$. If we choose $n > \frac 1\epsilon$, then you can see that the probability that $X_{nk}$ is non-zero (or different from zero) is less than $\epsilon$, as desired.
However, for each $w \in [0,1]$ and $n \in \mathbb N$, there exists $k$ such that $w \in [\frac kn, \frac{k+1}{n}]$. Therefore, the sequence ${X_{nk}}$ consists of infinitely many zeros and infinitely many ones, and therefore does not converge for any $w \in [0,1]$. Since $[0,1]$ has probability non-zero (infact, probability one), we conclude that $X_{nk} \nrightarrow X$ almost surely.
$\endgroup$ 7 $\begingroup$Consider the Lebesgue measure $P$ on $[0,1]$ and define a sequence of random variables $(X_n)$ by letting $X_n$ be the characteristic function of the interval $$ \left[\frac{j}{2^k},\frac{j+1}{2^k}\right], $$ where $k:=\lfloor\log_2(n)\rfloor$ and $j\in\{0,\ldots,2^k-1\}$ satisfies $n=2^k+j$. This sequence converges in probability to zero but does not converge almost surely to zero.
We have $$ X_1=1_{[0,1]},\, X_2=1_{[0,1/2]},\, X_3=1_{[1/2,1]},\, X_4=1_{[0,1/4]},\, X_5=1_{[1/4,1/2]},\, X_6=1_{[1/2,3/4]},\ldots. $$ From this it is easy to see that, for every $\varepsilon>0$, $$ P(|X_n|>\varepsilon)\to 0. $$ Thus $(X_n)$ converges to zero in probability. However, given any $\omega\in[0,1]$, we have that $X_n(\omega)=1$ for infinitely many $n$. Therefore $(X_n)$ does not converge to zero almost surely.
This example is commonly called the typewriter sequence. Note that there are many subsequences of $(X_n)$ which converge almost surely to zero.
$\endgroup$ 7