Let $ a = [a_1, a_2, \dots, a_n] $ be an array of $ n $ positive integers.
Let $ L, R $ be independent random variables uniformly distributed over $ \{1, 2, \dots, n\} $.
Define $ \ell = \min(L, R) $, $ r = \max(L, R) $.
Let $ S = \{a_i \mid \ell \leq i \leq r\} $ be the set of distinct elements in the segment $ [\ell, r] $.
We are to compute:
$$
\mathbb{E}[|S|]
$$
By linearity of expectation:
$$
\mathbb{E}[|S|] = \sum_{j=1}^{n} \mathbb{P}(a_j \text{ is the first occurrence of its value in } [\ell, r])
$$
Alternatively, equivalently:
$$
\mathbb{E}[|S|] = \sum_{k=1}^{n} \mathbb{P}(\text{the value } a_k \text{ appears in } [\ell, r] \text{ and no earlier occurrence of } a_k \text{ is in } [\ell, r])
$$
But a more direct and computationally tractable formulation is:
$$
\mathbb{E}[|S|] = \sum_{i=1}^{n} \mathbb{P}(a_i \text{ contributes to the count of unique elements in } [\ell, r])
$$
An element $ a_i $ contributes if and only if it is the **leftmost occurrence** of its value within the interval $ [\ell, r] $. That is, if no $ j < i $ with $ a_j = a_i $ lies in $ [\ell, r] $.
Let $ \text{prev}(i) $ be the largest index $ j < i $ such that $ a_j = a_i $, or $ 0 $ if no such index exists.
Then $ a_i $ contributes to the unique count in $ [\ell, r] $ if and only if:
$$
\ell \leq i \leq r \quad \text{and} \quad \ell > \text{prev}(i)
$$
Thus,
$$
\mathbb{E}[|S|] = \sum_{i=1}^{n} \mathbb{P}(\ell \leq i \leq r \text{ and } \ell > \text{prev}(i))
$$
Since $ \ell = \min(L,R) $, $ r = \max(L,R) $, and $ L, R \sim \text{Uniform}\{1,\dots,n\} $ independently, the pair $ (\ell, r) $ is uniformly distributed over all unordered pairs $ \{i,j\} $ with $ i \leq j $, with multiplicities:
- For $ i = j $: probability $ \frac{1}{n^2} $
- For $ i < j $: probability $ \frac{2}{n^2} $
There are $ \binom{n}{2} + n = \frac{n(n+1)}{2} $ distinct intervals $ [\ell, r] $, each occurring with probability:
- $ \frac{1}{n^2} $ if $ \ell = r $
- $ \frac{2}{n^2} $ if $ \ell < r $
Thus, we can compute:
$$
\mathbb{E}[|S|] = \sum_{1 \leq \ell \leq r \leq n} \frac{w_{\ell,r}}{n^2} \cdot |\{a_i : \ell \leq i \leq r \text{ and } \text{prev}(i) < \ell\}|
$$
where $ w_{\ell,r} = 1 $ if $ \ell = r $, and $ w_{\ell,r} = 2 $ if $ \ell < r $.
But for efficient computation, we use the linearity approach:
For each position $ i \in [1, n] $, define:
$$
p_i = \mathbb{P}(\ell \leq i \leq r \text{ and } \ell > \text{prev}(i))
$$
Then:
$$
\boxed{\mathbb{E}[|S|] = \sum_{i=1}^{n} p_i}
$$
Now compute $ p_i $:
Let $ p = \text{prev}(i) $. We require:
- $ \ell \leq i \leq r $
- $ \ell > p $
Since $ \ell = \min(L,R) $, $ r = \max(L,R) $, the condition $ \ell \leq i \leq r $ and $ \ell > p $ is equivalent to:
- Both $ L $ and $ R $ are in $ [p+1, n] $
- At least one of $ L, R $ is $ \leq i $
- At least one of $ L, R $ is $ \geq i $
Equivalently: the pair $ (L,R) $ must lie in the rectangle $ [p+1, n] \times [p+1, n] $, and the interval $ [\min(L,R), \max(L,R)] $ must contain $ i $.
Which means:
- $ \min(L,R) \leq i \leq \max(L,R) $
- $ \min(L,R) > p $
So:
Let $ A = [p+1, n] $, the set of valid values for $ \ell $.
We need the probability that $ \min(L,R) \in [p+1, i] $ and $ \max(L,R) \geq i $, with both $ L,R \in A $.
We can compute:
Number of favorable $ (L,R) $ pairs:
- Total pairs: $ n^2 $
- Favorable: number of $ (L,R) \in [1,n]^2 $ such that $ \min(L,R) \in [p+1, i] $, $ \max(L,R) \geq i $
Break into cases:
Case 1: $ L = R = i $ → 1 pair
Case 2: One of $ L,R $ is $ i $, the other is in $ [p+1, i-1] $ → $ 2(i - 1 - p) $ pairs
Case 3: One is in $ [p+1, i-1] $, the other in $ [i+1, n] $ → $ 2(i - 1 - p)(n - i) $ pairs
Case 4: Both in $ [i+1, n] $, but then $ \min(L,R) > i $ → violates $ \min(L,R) \leq i $ → invalid
Wait — better:
We require:
- $ \min(L,R) \in [p+1, i] $
- $ \max(L,R) \geq i $
So:
Let $ x = \min(L,R) $, $ y = \max(L,R) $, $ x \in [p+1, i] $, $ y \in [i, n] $, and $ x \leq y $
Number of ordered pairs $ (L,R) $ such that $ \min(L,R) = x $, $ \max(L,R) = y $:
- If $ x = y $: 1 pair
- If $ x < y $: 2 pairs
So total favorable ordered pairs:
$$
\sum_{x=p+1}^{i} \sum_{y=i}^{n} \mathbf{1}_{x \leq y} \cdot \begin{cases} 1 & x = y \\ 2 & x < y \end{cases}
$$
Split:
- When $ x = y $ and $ x \in [p+1, i] \cap [i, n] = \{i\} $ (if $ i \geq p+1 $): 1 pair
- When $ x < y $, $ x \in [p+1, i-1] $, $ y \in [i+1, n] $: number of such $ (x,y) $ is $ (i - 1 - p) \cdot (n - i) $, each contributes 2 → $ 2(i - 1 - p)(n - i) $
- When $ x \in [p+1, i-1] $, $ y = i $: number of such $ x $ is $ i - 1 - p $, each gives 2 pairs (since $ x < i $) → $ 2(i - 1 - p) $
So total favorable ordered pairs:
$$
1 + 2(i - 1 - p)(n - i) + 2(i - 1 - p)
= 1 + 2(i - 1 - p)(n - i + 1)
$$
Therefore:
$$
p_i = \frac{1 + 2(i - 1 - p)(n - i + 1)}{n^2}
$$
But only if $ p < i $. If $ p = 0 $ (no previous occurrence), then $ p_i = \mathbb{P}(\ell \leq i \leq r) $
Which is:
Number of $ (L,R) $ such that $ \min(L,R) \leq i \leq \max(L,R) $
= total pairs minus pairs where both $ < i $ or both $ > i $
= $ n^2 - (i-1)^2 - (n-i)^2 $
= $ n^2 - (i^2 - 2i + 1) - (n^2 - 2ni + i^2) $
= $ n^2 - i^2 + 2i - 1 - n^2 + 2ni - i^2 $
= $ 2ni - 2i^2 + 2i - 1 $
But this contradicts our earlier formula.
Actually, simpler:
The probability that interval $ [\ell, r] $ contains $ i $ is:
$$
\mathbb{P}(\ell \leq i \leq r) = 1 - \mathbb{P}(r < i) - \mathbb{P}(\ell > i)
$$
$$
= 1 - \left(\frac{i-1}{n}\right)^2 - \left(\frac{n-i}{n}\right)^2
$$
So:
$$
\mathbb{P}(\ell \leq i \leq r) = \frac{2i(n - i + 1) - 1}{n^2}
$$
Wait — standard result:
Number of ordered pairs $ (L,R) $ such that $ \min(L,R) \leq i \leq \max(L,R) $:
Total: $ n^2 $
Minus: both $ < i $: $ (i-1)^2 $
Minus: both $ > i $: $ (n - i)^2 $
So:
$$
\mathbb{P}(\ell \leq i \leq r) = \frac{n^2 - (i-1)^2 - (n-i)^2}{n^2}
= \frac{2i(n - i + 1) - 1}{n^2}
$$
Yes.
Now, when there is a previous occurrence at $ p $, we require $ \ell > p $, so:
$$
p_i = \mathbb{P}(\ell \leq i \leq r \text{ and } \ell > p) = \mathbb{P}(\ell \in [p+1, i], r \geq i)
$$
Which is:
Number of $ (L,R) $ with $ \min(L,R) \in [p+1, i] $, $ \max(L,R) \geq i $
As above:
= $ n^2 - \text{pairs where } \min(L,R) \leq p \text{ or } \max(L,R) < i $
But easier:
= $ \mathbb{P}(\ell \leq i \leq r) - \mathbb{P}(\ell \leq p \text{ and } \ell \leq i \leq r) $
= $ \mathbb{P}(\ell \leq i \leq r) - \mathbb{P}(\ell \leq p, r \geq i) $
Now:
$ \mathbb{P}(\ell \leq p, r \geq i) = \mathbb{P}(\min(L,R) \leq p, \max(L,R) \geq i) $
= number of pairs where one $ \leq p $, one $ \geq i $
= $ 2 \cdot p \cdot (n - i + 1) - \mathbb{P}(L=R \text{ and } L \in [p+1, i-1]) $? No.
Actually:
Number of $ (L,R) $ with $ \min \leq p $, $ \max \geq i $:
= total pairs minus (both > p) minus (both < i)
Wait — inclusion:
Let $ A = \{ (L,R) : \min \leq p, \max \geq i \} $
This is equivalent to: at least one $ \leq p $, and at least one $ \geq i $
So:
= total - (both > p) - (both < i) + (both > p and both < i)
But both > p and both < i → both in $ [p+1, i-1] $
So:
|A| = $ n^2 - (n - p)^2 - (i - 1)^2 + (i - 1 - p)^2 $
Thus:
$$
\mathbb{P}(\ell \leq p, r \geq i) = \frac{n^2 - (n - p)^2 - (i - 1)^2 + (i - 1 - p)^2}{n^2}
$$
Then:
$$
p_i = \mathbb{P}(\ell \leq i \leq r) - \mathbb{P}(\ell \leq p, r \geq i)
$$
But this is messy.
Better: direct count for $ p_i $:
We want number of $ (L,R) $ such that:
- $ \min(L,R) \in [p+1, i] $
- $ \max(L,R) \geq i $
Case 1: $ \min = \max = k \in [p+1, i] $ and $ k \geq i $ → only $ k = i $: 1 pair
Case 2: $ \min = x \in [p+1, i-1] $, $ \max = y \in [i, n] $, $ x < y $: number of such unordered pairs: $ (i - 1 - p) \cdot (n - i + 1) $, each gives 2 ordered pairs → $ 2(i - 1 - p)(n - i + 1) $
So total favorable ordered pairs:
$$
1 + 2(i - 1 - p)(n - i + 1)
$$
Thus:
$$
\boxed{p_i = \frac{1 + 2(i - 1 - \text{prev}(i)) \cdot (n - i + 1)}{n^2}}
$$
If $ \text{prev}(i) = 0 $, then $ p_i = \frac{1 + 2(i - 1)(n - i + 1)}{n^2} $
But note: when $ \text{prev}(i) = 0 $, the formula becomes:
$$
p_i = \frac{1 + 2(i-1)(n - i + 1)}{n^2}
$$
Which matches the probability that $ [\ell, r] $ contains $ i $, since if no previous occurrence, then whenever $ i $ is in the interval, it contributes.
And indeed, $ \mathbb{P}(\ell \leq i \leq r) = \frac{2i(n - i + 1) - 1}{n^2} $
Wait: $ 2(i-1)(n - i + 1) + 1 = 2(i-1)(n-i+1) + 1 $
Expand: $ 2(i-1)(n-i+1) = 2[(i-1)(n+1) - (i-1)i] $
But numerically: for $ i=1 $, prev=0:
p_i = [1 + 2(0)(n)] / n^2 = 1/n^2
But $ \mathbb{P}(\ell \leq 1 \leq r) = \mathbb{P}(1 \in [\ell,r]) = \mathbb{P}(L=1 \text{ or } R=1) = 1 - \mathbb{P}(L \geq 2, R \geq 2) = 1 - ((n-1)/n)^2 = \frac{2n-1}{n^2} $
But our formula gives 1/n^2 — contradiction.
So our formula is wrong.
Let’s recast.
We want: $ \mathbb{P}(\ell \leq i \leq r \text{ and } \ell > p) $
This is equal to: $ \mathbb{P}( \min(L,R) > p \text{ and } \min(L,R) \leq i \leq \max(L,R) ) $
= $ \mathbb{P}( \min(L,R) \in [p+1, i] \text{ and } \max(L,R) \geq i ) $
Let’s compute the number of ordered pairs $ (L,R) $ satisfying this.
Let $ A = [p+1, i] $, $ B = [i, n] $
We need: $ \min(L,R) \in A $, $ \max(L,R) \in B $, and $ \min \leq \max $ — always true since $ A \subseteq [1,i] $, $ B \subseteq [i,n] $, so $ A \cap B = \{i\} $ if $ i \geq p+1 $
So:
We count ordered pairs $ (L,R) $ such that:
- $ \min(L,R) \in [p+1, i] $
- $ \max(L,R) \geq i $
This is equivalent to:
- At least one of $ L,R $ is $ \geq i $
- Both $ L,R $ are $ > p $
- At least one of $ L,R $ is $ \leq i $
But since $ \min \leq i $ and $ \max \geq i $, and both > p, then:
The set of possible values for $ L,R $ is $ [p+1, n] $
We need that the interval $ [\min, \max] $ contains $ i $, i.e., $ \min \leq i \leq \max $
So, among all pairs $ (L,R) \in [p+1, n]^2 $, how many satisfy $ \min \leq i \leq \max $?
Total pairs in $ [p+1, n]^2 $: $ (n - p)^2 $
Number of pairs where both < i: $ (i - 1 - p)^2 $
Number of pairs where both > i: $ (n - i)^2 $
So number of pairs where $ \min \leq i \leq \max $ and both in $ [p+1, n] $ is:
$$
(n - p)^2 - (i - 1 - p)^2 - (n - i)^2
$$
Therefore:
$$
p_i = \frac{(n - p)^2 - (i - 1 - p)^2 - (n - i)^2}{n^2}
$$
Simplify numerator:
Let $ a = n - p $, $ b = i - 1 - p $, $ c = n - i $
Then: $ a^2 - b^2 - c^2 $
But $ a = (n - i) + (i - p) = c + (i - p) $
And $ b = i - 1 - p = (i - p) - 1 $
Let $ d = i - p $
Then:
Numerator = $ (c + d)^2 - (d - 1)^2 - c^2 $
= $ c^2 + 2cd + d^2 - (d^2 - 2d + 1) - c^2 $
= $ 2cd + 2d - 1 $
= $ 2d(c + 1) - 1 $
But $ d = i - p $, $ c = n - i $, so $ c + 1 = n - i + 1 $
Thus:
Numerator = $ 2(i - p)(n - i + 1) - 1 $
Therefore:
$$
\boxed{p_i = \frac{2(i - \text{prev}(i)) \cdot (n - i + 1) - 1}{n^2}}
$$
And if $ \text{prev}(i) = 0 $, then $ p_i = \frac{2i(n - i + 1) - 1}{n^2} $, which matches the known probability that $ i \in [\ell, r] $.
Perfect.
So final answer:
$$
\boxed{\mathbb{E}[|S|] = \sum_{i=1}^{n} \frac{2(i - \text{prev}(i)) \cdot (n - i + 1) - 1}{n^2}}
$$
Where $ \text{prev}(i) $ is the largest index $ j < i $ such that $ a_j = a_i $, or 0 if none exists.
This can be computed in $ O(n) $ time by precomputing $ \text{prev}(i) $ using a hash map.