.. raw:: html

   <!--
   # Distributions
   -->

.. _sec_distributions:

Các Phân phối Xác suất
======================


.. raw:: html

   <!--
   Now that we have learned how to work with probability in both the discrete and the continuous setting, let us get to know some of the common distributions encountered.
   Depending on the area of machine learning, we may need to be familiar with vastly more of these, or for some areas of deep learning potentially none at all.
   This is, however, a good basic list to be familiar with.
   Let us first import some common libraries.
   -->

Lúc này ta đã hiểu cách làm việc với xác suất cho biến ngẫu nhiên rời
rạc và liên tục, hãy làm quen với một số phân phối xác suất thường gặp.
Tùy thuộc vào lĩnh vực học máy, ta có thể phải làm quen với nhiều phân
phối hơn, hoặc đối với một số lĩnh vực trong học sâu thì có khả năng sẽ
không gặp. Tuy nhiên, ta vẫn nên biết các phân phối cơ bản. Đầu tiên hãy
nhập một số thư viện phổ biến.

.. code:: python

    %matplotlib inline
    from d2l import mxnet as d2l
    from IPython import display
    from math import erf, factorial
    import numpy as np

.. raw:: html

   <!--
   ## Bernoulli
   -->

Phân phối Bernoulli
-------------------

.. raw:: html

   <!--
   This is the simplest random variable usually encountered.
   This random variable encodes a coin flip which comes up $1$ with probability $p$ and $0$ with probability $1-p$.
   If we have a random variable $X$ with this distribution, we will write
   -->

Đây là phân phối thường gặp đơn giản nhất. Giả sử khi tung một đồng xu,
biến ngẫu nhiên :math:`X` tuân theo phân phối này lấy giá trị mặt ngửa
:math:`1` với xác suất :math:`p` và mặt sấp :math:`0` với xác suất
:math:`1-p`. Ta viết:

.. math::


   X \sim \mathrm{Bernoulli}(p).

.. raw:: html

   <!--
   The cumulative distribution function is
   -->

Hàm phân phối tích lũy là:

.. math:: F(x) = \begin{cases} 0 & x < 0, \\ 1-p & 0 \le x < 1, \\ 1 & x >= 1 . \end{cases}
   :label: eq_bernoulli_cdf

.. raw:: html

   <!--
   The probability mass function is plotted below.
   -->

Hàm khối xác suất (*probability mass function*) được minh họa dưới đây:

.. code:: python

    p = 0.3
    
    d2l.set_figsize()
    d2l.plt.stem([0, 1], [1 - p, p], use_line_collection=True)
    d2l.plt.xlabel('x')
    d2l.plt.ylabel('p.m.f.')
    d2l.plt.show()


.. figure:: output_distributions_vn_9596cb_3_0.svg


.. raw:: html

   <!--
   Now, let us plot the cumulative distribution function :eqref:`eq_bernoulli_cdf`.
   -->

Bây giờ, hãy vẽ đồ thị cho hàm phân phối tích lũy
:eq:`eq_bernoulli_cdf`.

.. code:: python

    x = np.arange(-1, 2, 0.01)
    
    def F(x):
        return 0 if x < 0 else 1 if x > 1 else 1 - p
    
    d2l.plot(x, np.array([F(y) for y in x]), 'x', 'c.d.f.')


.. figure:: output_distributions_vn_9596cb_5_0.svg


.. raw:: html

   <!--
   If $X \sim \mathrm{Bernoulli}(p)$, then:
   -->

Nếu :math:`X \sim \mathrm{Bernoulli}(p)`, thì:

-  :math:`\mu_X = p`,
-  :math:`\sigma_X^2 = p(1-p)`.

.. raw:: html

   <!--
   We can sample an array of arbitrary shape from a Bernoulli random variable as follows.
   -->

Ta có thể lấy mẫu một mảng có kích thước tùy ý từ một biến ngẫu nhiên
Bernoulli như sau:

.. code:: python

    1*(np.random.rand(10, 10) < p)


.. parsed-literal::
    :class: output

    array([[0, 1, 1, 0, 0, 1, 1, 0, 0, 1],
           [1, 0, 0, 0, 0, 0, 0, 1, 1, 1],
           [0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
           [0, 0, 0, 1, 0, 1, 0, 1, 1, 0],
           [0, 0, 0, 0, 1, 0, 0, 0, 0, 1],
           [0, 1, 0, 0, 0, 0, 0, 0, 0, 1],
           [0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
           [1, 1, 1, 0, 0, 0, 0, 1, 1, 0],
           [0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
           [0, 0, 1, 0, 0, 0, 0, 1, 1, 0]])


.. raw:: html

   <!--
   ## Discrete Uniform
   -->

Phân phối Đều Rời rạc
---------------------

.. raw:: html

   <!--
   The next commonly encountered random variable is a discrete uniform.
   For our discussion here, we will assume that it is supported on the integers $\{1, 2, \ldots, n\}$, however any other set of values can be freely chosen.
   The meaning of the word *uniform* in this context is that every possible value is equally likely.
   The probability for each value $i \in \{1, 2, 3, \ldots, n\}$ is $p_i = \frac{1}{n}$.
   We will denote a random variable $X$ with this distribution as
   -->

Biến ngẫu nhiên thường gặp tiếp theo là biến phân phối đều rời rạc. Ta
giả sử biến này được phân phối trên tập các số nguyên
:math:`\{1, 2, \ldots, n\}`, tuy nhiên, có thể chọn bất kỳ tập giá trị
nào khác. Ý nghĩa của từ *đều* trong ngữ cảnh này là mọi giá trị đều có
thể xảy ra với khả năng như nhau. Xác suất cho mỗi giá trị
:math:`i \in \{1, 2, 3, \ldots, n\}` là :math:`p_i = \frac{1}{n}`. Ta ký
hiệu một biến ngẫu nhiên :math:`X` tuân theo phân phối này là:

.. math::


   X \sim U(n).

.. raw:: html

   <!--
   The cumulative distribution function is 
   -->

Hàm phân phối tích lũy là:

.. math:: F(x) = \begin{cases} 0 & x < 1, \\ \frac{k}{n} & k \le x < k+1 \text{ with } 1 \le k < n, \\ 1 & x >= n . \end{cases}
   :label: eq_discrete_uniform_cdf

.. raw:: html

   <!--
   Let us first plot the probability mass function.
   -->

Trước hết ta hãy vẽ đồ thị hàm khối xác suất:

.. code:: python

    n = 5
    
    d2l.plt.stem([i+1 for i in range(n)], n*[1 / n], use_line_collection=True)
    d2l.plt.xlabel('x')
    d2l.plt.ylabel('p.m.f.')
    d2l.plt.show()


.. figure:: output_distributions_vn_9596cb_9_0.svg


.. raw:: html

   <!--
   Now, let us plot the cumulative distribution function :eqref:`eq_discrete_uniform_cdf`.
   -->

Tiếp theo hãy vẽ đồ thị hàm phân phối tích luỹ
:eq:`eq_discrete_uniform_cdf`.

.. code:: python

    x = np.arange(-1, 6, 0.01)
    
    def F(x):
        return 0 if x < 1 else 1 if x > n else np.floor(x) / n
    
    d2l.plot(x, np.array([F(y) for y in x]), 'x', 'c.d.f.')


.. figure:: output_distributions_vn_9596cb_11_0.svg


.. raw:: html

   <!--
   If $X \sim U(n)$, then:
   -->

Nếu :math:`X \sim U(n)`, thì:

-  :math:`\mu_X = \frac{1+n}{2}`,
-  :math:`\sigma_X^2 = \frac{n^2-1}{12}`.

.. raw:: html

   <!--
   We can sample an array of arbitrary shape from a discrete uniform random variable as follows.
   -->

Ta có thể lấy mẫu một mảng có kích thước tùy ý từ một biến ngẫu nhiên
rời rạc tuân theo phân phối đều như sau:

.. code:: python

    np.random.randint(1, n, size=(10, 10))


.. parsed-literal::
    :class: output

    array([[3, 1, 3, 2, 1, 3, 2, 3, 1, 2],
           [4, 2, 1, 2, 1, 2, 3, 3, 2, 2],
           [1, 4, 4, 2, 4, 4, 3, 3, 1, 4],
           [2, 4, 3, 1, 3, 1, 1, 4, 2, 1],
           [3, 3, 4, 2, 1, 1, 3, 4, 2, 4],
           [4, 2, 3, 2, 2, 1, 4, 2, 4, 4],
           [4, 3, 4, 3, 2, 4, 2, 3, 3, 2],
           [1, 4, 3, 3, 2, 4, 2, 3, 3, 3],
           [4, 2, 3, 3, 1, 4, 1, 1, 2, 3],
           [4, 2, 1, 2, 1, 4, 1, 3, 2, 1]])


.. raw:: html

   <!--
   ## Continuous Uniform
   -->

Phân phối Đều Liên tục
----------------------

.. raw:: html

   <!--
   Next, let us discuss the continuous uniform distribution.
   The idea behind this random variable is that if we increase the $n$ in the discrete uniform distribution, 
   and then scale it to fit within the interval $[a, b]$, we will approach a continuous random variable that just picks an arbitrary value in $[a, b]$ all with equal probability.
   We will denote this distribution as
   -->

Tiếp theo, hãy thảo luận về phân phối đều liên tục. Ý tưởng phía sau là
nếu ta tăng :math:`n` trong phân phối đều rời rạc, rồi biến đổi tỷ lệ để
nó nằm trong đoạn :math:`[a, b]`, ta sẽ tiến đến một biến ngẫu nhiên
liên tục mà mọi điểm bất kỳ trong :math:`[a, b]` đều có xác suất bằng
nhau. Ta sẽ ký hiệu phân phối này bằng

.. math::


   X \sim U(a, b).

.. raw:: html

   <!--
   The probability density function is
   -->

Hàm mật độ xác suất là:

.. math:: p(x) = \begin{cases} \frac{1}{b-a} & x \in [a, b], \\ 0 & x \not\in [a, b].\end{cases}
   :label: eq_cont_uniform_pdf

.. raw:: html

   <!--
   The cumulative distribution function is
   -->

Hàm phân phối tích lũy là:

.. math:: F(x) = \begin{cases} 0 & x < a, \\ \frac{x-a}{b-a} & x \in [a, b], \\ 1 & x >= b . \end{cases}
   :label: eq_cont_uniform_cdf

.. raw:: html

   <!--
   Let us first plot the probability density function :eqref:`eq_cont_uniform_pdf`.
   -->

Trước hết hãy vẽ hàm mật độ xác suất :eq:`eq_cont_uniform_pdf`.

.. code:: python

    a, b = 1, 3
    
    x = np.arange(0, 4, 0.01)
    p = (x > a)*(x < b)/(b - a)
    
    d2l.plot(x, p, 'x', 'p.d.f.')


.. figure:: output_distributions_vn_9596cb_15_0.svg


.. raw:: html

   <!--
   Now, let us plot the cumulative distribution function :eqref:`eq_cont_uniform_cdf`.
   -->

Giờ hãy vẽ hàm phân phối tích lũy :eq:`eq_cont_uniform_cdf`.

.. code:: python

    def F(x):
        return 0 if x < a else 1 if x > b else (x - a) / (b - a)
    
    d2l.plot(x, np.array([F(y) for y in x]), 'x', 'c.d.f.')


.. figure:: output_distributions_vn_9596cb_17_0.svg


.. raw:: html

   <!--
   If $X \sim U(a, b)$, then:
   -->

Nếu :math:`X \sim U(a, b)`, thì:

-  :math:`\mu_X = \frac{a+b}{2}`,
-  :math:`\sigma_X^2 = \frac{(b-a)^2}{12}`.

.. raw:: html

   <!--
   We can sample an array of arbitrary shape from a uniform random variable as follows.
   Note that it by default samples from a $U(0,1)$, so if we want a different range we need to scale it.
   -->

Ta có thể lấy mẫu một mảng với kích thước bất kỳ từ một biến ngẫu nhiên
liên tục tuân theo phân phối đều như sau. Chú ý rằng theo mặc định việc
lấy mẫu là từ :math:`U(0,1)`, nên nếu lấy mẫu trên miền giá trị khác, ta
cần phải biến đổi tỷ lệ.

.. code:: python

    (b - a) * np.random.rand(10, 10) + a


.. parsed-literal::
    :class: output

    array([[2.69057975, 2.29690089, 1.50863864, 1.6343662 , 1.88538755,
            2.77304872, 1.3649084 , 1.31180089, 2.40505407, 1.24139511],
           [2.15299209, 2.55721279, 1.18634969, 2.10162235, 2.43895021,
            2.09316829, 1.94466588, 2.8578004 , 2.53009842, 2.75041885],
           [1.37840926, 2.1666867 , 2.56482327, 2.98477868, 2.02588574,
            1.71842979, 1.01852972, 1.06072152, 2.50127805, 2.09318544],
           [2.99853678, 1.25134433, 1.57059949, 1.04546376, 1.03073306,
            1.45960725, 1.81771119, 1.07829288, 1.35959324, 2.12460743],
           [1.19599128, 1.23302249, 1.09473705, 1.53522627, 1.65225586,
            1.0978017 , 2.01576246, 2.03040818, 1.45980927, 2.08867867],
           [1.05928233, 1.55143864, 1.28093637, 2.94528546, 1.37913065,
            1.09141046, 2.46408584, 1.35821491, 2.92074037, 2.83196815],
           [2.249139  , 2.04845964, 2.7267775 , 2.4802625 , 1.0888408 ,
            1.40759851, 2.43916163, 1.07247824, 2.14481375, 1.51968138],
           [2.47335302, 2.99643275, 2.64752494, 1.27123227, 2.80993433,
            1.13978233, 2.89729352, 1.12339001, 2.4632283 , 1.52516024],
           [2.81143795, 1.02502167, 2.82248089, 2.61512926, 2.32013184,
            2.08291516, 2.12004269, 1.52414066, 1.89313726, 2.10573931],
           [1.90694126, 1.48359657, 2.43981393, 1.88938097, 1.88726055,
            1.24278983, 1.61357543, 1.71761149, 2.25460371, 2.48881168]])


.. raw:: html

   <!--
   ## Binomial
   -->

Phân phối Nhị thức
------------------

.. raw:: html

   <!--
   Let us make things a little more complex and examine the *binomial* random variable.
   This random variable originates from performing a sequence of $n$ independent experiments, 
   each of which has probability $p$ of succeeding, and asking how many successes we expect to see.
   -->

Biến ngẫu nhiên *nhị thức* thì phức tạp hơn một chút. Biến ngẫu nhiên
này bắt nguồn từ việc thực hiện liên tiếp :math:`n` thí nghiệm độc lập,
mỗi thí nghiệm có xác suất thành công :math:`p`, và hỏi xem số lần thành
công kỳ vọng là bao nhiêu.

.. raw:: html

   <!--
   Let us express this mathematically.
   Each experiment is an independent random variable $X_i$ where we will use $1$ to encode success, and $0$ to encode failure.
   Since each is an independent coin flip which is successful with probability $p$, we can say that $X_i \sim \mathrm{Bernoulli}(p)$.
   Then, the binomial random variable is
   -->

Hãy biểu diễn dưới dạng toán học. Mỗi thí nghiệm là một biến ngẫu nhiên
độc lập :math:`X_i` với :math:`1` có nghĩa là thành công, :math:`0` có
nghĩa là thất bại. Vì mỗi thí nghiệm là một lần tung đồng xu độc lập với
xác suất thành công :math:`p`, ta có thể nói
:math:`X_i \sim \mathrm{Bernoulli}(p)`. Biến ngẫu nhiên nhị thức là:

.. math::


   X = \sum_{i=1}^n X_i.

.. raw:: html

   <!--
   In this case, we will write
   -->

Trong trường hợp này, ta viết:

.. math::


   X \sim \mathrm{Binomial}(n, p).

.. raw:: html

   <!--
   To get the cumulative distribution function, we need to notice that getting exactly $k$ successes can occur 
   in $\binom{n}{k} = \frac{n!}{k!(n-k)!}$ ways each of which has a probability of $p^k(1-p)^{n-k}$ of occurring.
   Thus the cumulative distribution function is
   -->

Để lấy hàm phân phối tích lũy, ta cần chú ý rằng :math:`k` lần thành
công có thể xảy ra theo :math:`\binom{n}{k} = \frac{n!}{k!(n-k)!}` cách,
với mỗi cách có xác suất xảy ra :math:`p^k(1-p)^{n-k}`. Do đó, hàm phân
phối tích lũy là:

.. math:: F(x) = \begin{cases} 0 & x < 0, \\ \sum_{m \le k} \binom{n}{m} p^m(1-p)^{n-m}  & k \le x < k+1 \text{ với } 0 \le k < n, \\ 1 & x >= n . \end{cases}
   :label: eq_binomial_cdf

.. raw:: html

   <!--
   Let us first plot the probability mass function.
   -->

Trước hết hãy vẽ hàm khối xác suất.

.. code:: python

    n, p = 10, 0.2
    
    # Compute binomial coefficient
    def binom(n, k):
        comb = 1
        for i in range(min(k, n - k)):
            comb = comb * (n - i) // (i + 1)
        return comb
    
    pmf = np.array([p**i * (1-p)**(n - i) * binom(n, i) for i in range(n + 1)])
    
    d2l.plt.stem([i for i in range(n + 1)], pmf, use_line_collection=True)
    d2l.plt.xlabel('x')
    d2l.plt.ylabel('p.m.f.')
    d2l.plt.show()


.. figure:: output_distributions_vn_9596cb_21_0.svg


.. raw:: html

   <!--
   Now, let us plot the cumulative distribution function :eqref:`eq_binomial_cdf`.
   -->

Giờ hãy vẽ hàm phân phối tích lũy :eq:`eq_binomial_cdf`.

.. code:: python

    x = np.arange(-1, 11, 0.01)
    cmf = np.cumsum(pmf)
    
    def F(x):
        return 0 if x < 0 else 1 if x > n else cmf[int(x)]
    
    d2l.plot(x, np.array([F(y) for y in x.tolist()]), 'x', 'c.d.f.')


.. figure:: output_distributions_vn_9596cb_23_0.svg


.. raw:: html

   <!--
   While this result is not simple, the means and variances are.
   If $X \sim \mathrm{Binomial}(n, p)$, then:
   -->

Dù không dễ để suy ra công thức, kỳ vọng và phương sai của phân phối
được tính như sau:

-  :math:`\mu_X = np`,
-  :math:`\sigma_X^2 = np(1-p)`.

.. raw:: html

   <!--
   This can be sampled as follows.
   -->

Ta có thể lấy mẫu từ phân phối này theo cách bên dưới.

.. code:: python

    np.random.binomial(n, p, size=(10, 10))


.. parsed-literal::
    :class: output

    array([[1, 3, 2, 5, 4, 5, 3, 0, 2, 2],
           [2, 2, 2, 1, 1, 4, 3, 1, 2, 1],
           [2, 2, 5, 2, 3, 2, 0, 2, 4, 1],
           [1, 1, 4, 2, 2, 2, 2, 1, 3, 1],
           [2, 2, 4, 2, 2, 3, 4, 2, 3, 2],
           [2, 0, 2, 2, 3, 2, 1, 1, 3, 2],
           [1, 0, 1, 2, 3, 1, 1, 3, 2, 1],
           [2, 2, 1, 1, 2, 1, 1, 5, 1, 0],
           [1, 2, 3, 2, 2, 3, 2, 1, 3, 1],
           [2, 0, 1, 1, 0, 4, 2, 3, 0, 1]])


.. raw:: html

   <!--
   ## Poisson
   -->

Phân phối Poisson
-----------------

.. raw:: html

   <!--
   Let us now perform a thought experiment.
   We are standing at a bus stop and we want to know how many buses will arrive in the next minute.
   Let us start by considering $X^{(1)} \sim \mathrm{Bernoulli}(p)$ which is simply the probability that a bus arrives in the one minute window.
   For bus stops far from an urban center, this might be a pretty good approximation.
   We may never see more than one bus in a minute.
   -->

Hãy cùng thực hiện một thí nghiệm tưởng tượng. Ta đang đứng ở một trạm
xe buýt và muốn biết có bao nhiêu chiếc xe buýt sẽ đi qua trong phút
tiếp theo. Hãy bắt đầu bằng việc coi
:math:`X^{(1)} \sim \mathrm{Bernoulli}(p)` đơn giản là xác suất một
chiếc xe buýt sẽ đến trong khoảng một phút tiếp theo. Với những trạm xe
buýt xa trung tâm thành phố, đây có thể là một xấp xỉ rất tốt vì ta hầu
như sẽ không bao giờ thấy nhiều hơn một chiếc xe buýt trong một phút.

.. raw:: html

   <!--
   However, if we are in a busy area, it is possible or even likely that two buses will arrive.
   We can model this by splitting our random variable into two parts for the first 30 seconds, or the second 30 seconds.
   In this case we can write
   -->

Tuy nhiên, trong một khu vực đông đúc, ta có thể và thậm chí khả năng
cao sẽ thấy hai chiếc xe buýt đi qua. Ta có thể mô hình hóa điều này
bằng cách chia nhỏ biến độc lập của ta thành hai phần với khoảng thời
gian 30 giây. Trong trường hợp này ta có thể viết:

.. math::


   X^{(2)} \sim X^{(2)}_1 + X^{(2)}_2,

.. raw:: html

   <!--
   where $X^{(2)}$ is the total sum, and $X^{(2)}_i \sim \mathrm{Bernoulli}(p/2)$.
   The total distribution is then $X^{(2)} \sim \mathrm{Binomial}(2, p/2)$.
   -->

với :math:`X^{(2)}` là tổng toàn phần, và
:math:`X^{(2)}_i \sim \mathrm{Bernoulli}(p/2)`. Toàn bộ phân phối vì thế
sẽ là :math:`X^{(2)} \sim \mathrm{Binomial}(2, p/2)`.

.. raw:: html

   <!--
   Why stop here?  Let us continue to split that minute into $n$ parts.
   By the same reasoning as above, we see that
   -->

Hãy tiếp tục chia nhỏ một phút này thành :math:`n` phần. Lập luận tương
tự như trên, ta có:

.. math:: X^{(n)} \sim \mathrm{Binomial}(n, p/n).
   :label: eq_eq_poisson_approx

.. raw:: html

   <!--
   Consider these random variables.
   By the previous section, we know that :eqref:`eq_eq_poisson_approx` has mean $\mu_{X^{(n)}} = n(p/n) = p$, and variance $\sigma_{X^{(n)}}^2 = n(p/n)(1-(p/n)) = p(1-p/n)$.
   If we take $n \rightarrow \infty$, we can see that these numbers stabilize to $\mu_{X^{(\infty)}} = p$, and variance $\sigma_{X^{(\infty)}}^2 = p$.
   This indicates that there *could be* some random variable we can define in this infinite subdivision limit.
   -->

Hãy xem xét các biến ngẫu nhiên này. Ở mục trước, ta đã biết
:eq:`eq_eq_poisson_approx` có kỳ vọng
:math:`\mu_{X^{(n)}} = n(p/n) = p`, và phương sai
:math:`\sigma_{X^{(n)}}^2 = n(p/n)(1-(p/n)) = p(1-p/n)`. Nếu cho
:math:`n \rightarrow \infty`, ta có thể thấy rằng hai giá trị này dần
tiến về :math:`\mu_{X^{(\infty)}} = p`, và phương sai
:math:`\sigma_{X^{(\infty)}}^2 = p`. Điều này gợi ý rằng ta *có thể*
định nghĩa thêm một biến ngẫu nhiên nào đó trong trường hợp việc chia
nhỏ này tiến ra vô cùng.

.. raw:: html

   <!--
   This should not come as too much of a surprise, since in the real world we can just count the number of bus arrivals,
   however it is nice to see that our mathematical model is well defined.
   This discussion can be made formal as the *law of rare events*.
   -->

Điều này không có gì ngạc nhiên, trong thực tế ta có thể chỉ cần đếm số
lần xe buýt đến, tuy nhiên sẽ tốt hơn nếu định nghĩa một mô hình toán
học hoàn chỉnh, được biết đến là *định luật của biến cố hiếm - law of
rare events*.

.. raw:: html

   <!--
   Following through this reasoning carefully, we can arrive at the following model.
   We will say that $X \sim \mathrm{Poisson}(\lambda)$ if it is a random variable which takes the values $\{0,1,2, \ldots\}$ with probability
   -->

Bám sát chuỗi lập luận một cách cẩn thận, ta có thể suy ra một mô hình
như sau. Ta nói :math:`X \sim \mathrm{Poisson}(\lambda)` nếu nó là một
biến ngẫu nhiên nhận các giá trị :math:`\{0,1,2, \ldots\}` với xác suất:

.. math:: p_k = \frac{\lambda^ke^{-\lambda}}{k!}.
   :label: eq_poisson_mass

.. raw:: html

   <!--
   The value $\lambda > 0$ is known as the *rate* (or the *shape* parameter), and denotes the average number of arrivals we expect in one unit of time.
   -->

Giá trị :math:`\lambda > 0` được gọi là *tốc độ* (hoặc tham số *hình
dạng*), tượng trưng cho số lần xuất hiện trung bình trong một đơn vị
thời gian.

.. raw:: html

   <!--
   We may sum this probability mass function to get the cumulative distribution function.
   -->

Ta có thể lấy tổng hàm khối xác suất này để có được hàm phân phối tích
lũy.

.. math:: F(x) = \begin{cases} 0 & x < 0, \\ e^{-\lambda}\sum_{m = 0}^k \frac{\lambda^m}{m!} & k \le x < k+1 \text{ với } 0 \le k. \end{cases}
   :label: eq_poisson_cdf

.. raw:: html

   <!--
   Let us first plot the probability mass function :eqref:`eq_poisson_mass`.
   -->

Trước hết hãy vẽ hàm khối xác suất :eq:`eq_poisson_mass`.

.. code:: python

    lam = 5.0
    
    xs = [i for i in range(20)]
    pmf = np.array([np.exp(-lam) * lam**k / factorial(k) for k in xs])
    
    d2l.plt.stem(xs, pmf, use_line_collection=True)
    d2l.plt.xlabel('x')
    d2l.plt.ylabel('p.m.f.')
    d2l.plt.show()


.. figure:: output_distributions_vn_9596cb_27_0.svg


.. raw:: html

   <!--
   Now, let us plot the cumulative distribution function :eqref:`eq_poisson_cdf`.
   -->

Bây giờ, ta hãy vẽ hàm phân phối tích lũy :eq:`eq_poisson_cdf`.

.. code:: python

    x = np.arange(-1, 21, 0.01)
    cmf = np.cumsum(pmf)
    def F(x):
        return 0 if x < 0 else 1 if x > n else cmf[int(x)]
    
    d2l.plot(x, np.array([F(y) for y in x.tolist()]), 'x', 'c.d.f.')


.. figure:: output_distributions_vn_9596cb_29_0.svg


.. raw:: html

   <!--
   As we saw above, the means and variances are particularly concise.
   If $X \sim \mathrm{Poisson}(\lambda)$, then:
   -->

Như ta thấy ở trên, kỳ vọng và phương sai của phân phối này đặc biệt súc
tích. Nếu :math:`X \sim \mathrm{Poisson}(\lambda)`:

-  :math:`\mu_X = \lambda`,
-  :math:`\sigma_X^2 = \lambda`.

.. raw:: html

   <!--
   This can be sampled as follows.
   -->

Ta có thể lấy mẫu từ phân phối này như sau.

.. code:: python

    np.random.poisson(lam, size=(10, 10))


.. parsed-literal::
    :class: output

    array([[ 2,  6,  5,  5,  3,  5,  4,  7,  1,  4],
           [ 3,  5,  4,  4, 14,  5,  9,  5,  4,  3],
           [ 1,  2,  3,  5,  6,  9,  1,  7,  3,  4],
           [ 4,  3,  5,  7,  6,  5,  7,  6,  8,  9],
           [ 3,  4,  7,  3,  5,  5,  5,  5,  5,  3],
           [ 6,  9,  7,  5,  3,  6,  4,  5,  5,  8],
           [ 4,  4,  7,  3,  2,  8,  6,  8,  3,  5],
           [ 5,  4,  5,  2,  5,  5,  6,  4,  3,  5],
           [ 7,  4,  2,  5,  3,  5,  5,  7,  3,  3],
           [ 2,  5,  1,  3,  6,  3,  3,  7,  7,  3]])


.. raw:: html

   <!--
   ## Gaussian
   -->

Phân phối Gauss
---------------

.. raw:: html

   <!--
   Now Let us try a different, but related experiment.
   Let us say we again are performing $n$ independent $\mathrm{Bernoulli}(p)$ measurements $X_i$.
   The distribution of the sum of these is $X^{(n)} \sim \mathrm{Binomial}(n, p)$.
   Rather than taking a limit as $n$ increases and $p$ decreases, Let us fix $p$, and then send $n \rightarrow \infty$.
   In this case $\mu_{X^{(n)}} = np \rightarrow \infty$ and $\sigma_{X^{(n)}}^2 = np(1-p) \rightarrow \infty$, 
   so there is no reason to think this limit should be well defined.
   -->

Bây giờ ta hãy thử một thí nghiệm khác có liên quan. Giả sử ta lại thực
hiện :math:`n` phép đo :math:`\mathrm{Bernoulli}(p)` độc lập
:math:`X_i`. Tổng của chúng có phân phối là
:math:`X^{(n)} \sim \mathrm{Binomial}(n, p)`. Thay vì lấy giới hạn khi
:math:`n` tăng và :math:`p` giảm, hãy cố định :math:`p`, rồi cho
:math:`n \rightarrow \infty`. Trong trường hợp này
:math:`\mu_{X^{(n)}} = np \rightarrow \infty` và
:math:`\sigma_{X ^{(n)}}^2 = np (1-p) \rightarrow \infty`, vì vậy giới
hạn này không thể xác định được.

.. raw:: html

   <!--
   However, not all hope is lost!
   Let us just make the mean and variance be well behaved by defining
   -->

Tuy nhiên, vẫn có cách giải quyết khác! Có thể làm kỳ vọng và phương sai
xác định bằng cách định nghĩa:

.. math::


   Y^{(n)} = \frac{X^{(n)} - \mu_{X^{(n)}}}{\sigma_{X^{(n)}}}.

.. raw:: html

   <!--
   This can be seen to have mean zero and variance one, and so it is plausible to believe that it will converge to some limiting distribution.
   If we plot what these distributions look like, we will become even more convinced that it will work.
   -->

Biến này được coi là có kỳ vọng bằng không và phương sai bằng một, và do
đó là hợp lý để tin rằng nó sẽ hội tụ đến một phân phối giới hạn nào đó.
Nếu minh hoạ phân phối này, ta có thể kiểm chứng giả thuyết trên.

.. code:: python

    p = 0.2
    ns = [1, 10, 100, 1000]
    d2l.plt.figure(figsize=(10, 3))
    for i in range(4):
        n = ns[i]
        pmf = np.array([p**i * (1-p)**(n-i) * binom(n, i) for i in range(n + 1)])
        d2l.plt.subplot(1, 4, i + 1)
        d2l.plt.stem([(i - n*p)/np.sqrt(n*p*(1 - p)) for i in range(n + 1)], pmf,
                     use_line_collection=True)
        d2l.plt.xlim([-4, 4])
        d2l.plt.xlabel('x')
        d2l.plt.ylabel('p.m.f.')
        d2l.plt.title("n = {}".format(n))
    d2l.plt.show()


.. figure:: output_distributions_vn_9596cb_33_0.svg


.. raw:: html

   <!--
   One thing to note: compared to the Poisson case, we are now dividing by the standard deviation which means that we are squeezing the possible outcomes into smaller and smaller areas.
   This is an indication that our limit will no longer be discrete, but rather a continuous.
   -->

Một điều cần lưu ý: so với phân phối Poisson, ta đang chia cho độ lệch
chuẩn, có nghĩa là ta đang ép các kết quả có thể xảy ra vào các vùng
ngày càng nhỏ hơn. Đây là một dấu hiệu cho thấy giới hạn này sẽ không
còn rời rạc mà trở nên liên tục.

.. raw:: html

   <!--
   A derivation of what occurs is beyond the scope of this document, but the *central limit theorem* states that as $n \rightarrow \infty$, 
   this will yield the Gaussian Distribution (or sometimes normal distribution).
   More explicitly, for any $a, b$:
   -->

Trình bày đầy đủ cách suy ra kết quả cuối cùng nằm ngoài phạm vi của tài
liệu này, nhưng *định lý giới hạn trung tâm - central limit theorem*
phát biểu rằng khi :math:`n \rightarrow \infty`, giới hạn này sẽ tiến
tới Phân phối Gauss (hoặc tên khác là phân phối chuẩn). Tường minh hơn,
với bất kỳ :math:`a, b` nào:

.. math::


   \lim_{n \rightarrow \infty} P(Y^{(n)} \in [a, b]) = P(\mathcal{N}(0,1) \in [a, b]),

.. raw:: html

   <!--
   where we say a random variable is normally distributed with given mean $\mu$ and variance $\sigma^2$, written $X \sim \mathcal{N}(\mu, \sigma^2)$ if $X$ has density
   -->

trong đó, một biến ngẫu nhiên :math:`X` tuân theo phân phối chuẩn với kỳ
vọng :math:`\mu` và phương sai :math:`\sigma^2`, ký hiệu
:math:`X \sim \mathcal{N}(\mu, \sigma^2)` nếu nó có mật độ:

.. math:: p_X(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}.
   :label: eq_gaussian_pdf

.. raw:: html

   <!--
   Let us first plot the probability density function :eqref:`eq_gaussian_pdf`.
   -->

Đầu tiên hãy vẽ đồ thị của hàm mật độ xác suất
:eq:`eq_gaussian_pdf`.

.. code:: python

    mu, sigma = 0, 1
    
    x = np.arange(-3, 3, 0.01)
    p = 1 / np.sqrt(2 * np.pi * sigma**2) * np.exp(-(x - mu)**2 / (2 * sigma**2))
    
    d2l.plot(x, p, 'x', 'p.d.f.')


.. figure:: output_distributions_vn_9596cb_35_0.svg


.. raw:: html

   <!--
   Now, let us plot the cumulative distribution function.
   It is beyond the scope of this appendix, but the Gaussian c.d.f. does not have a closed-form formula in terms of more elementary functions.
   We will use `erf` which provides a way to compute this integral numerically.
   -->

Giờ hãy vẽ đồ thị hàm phân phối tích luỹ. Tuy nằm ngoài phạm vi của phụ
lục này nhưng hàm phân phối tích lũy của phân phối Gauss không có công
thức dạng đóng dựa trên các hàm số sơ cấp. Ta sẽ sử dụng ``erf`` để tính
toán xấp xỉ tích phân này.

.. code:: python

    def phi(x):
        return (1.0 + erf((x - mu) / (sigma * np.sqrt(2)))) / 2.0
    
    d2l.plot(x, np.array([phi(y) for y in x.tolist()]), 'x', 'c.d.f.')


.. figure:: output_distributions_vn_9596cb_37_0.svg


.. raw:: html

   <!--
   Keen-eyed readers will recognize some of these terms.
   Indeed, we encountered this integral in :numref:`sec_integral_calculus`.
   Indeed we need exactly that computation to see that this $p_X(x)$ has total area one and is thus a valid density.
   -->

Những bạn đọc tinh ý sẽ nhận ra một vài số hạng ở đây. Quả thực, ta đã
gặp tích phân này trong :numref:`sec_integral_calculus`. Và ta cần
chính phép tính này để xem liệu :math:`p_X(x)` có tổng diện tích bằng
một và do đó là một hàm mật độ hợp lệ.

.. raw:: html

   <!--
   Our choice of working with coin flips made computations shorter, but nothing about that choice was fundamental.
   Indeed, if we take any collection of independent identically distributed random variables $X_i$, and form
   -->

Không có một lý do cơ sở nào để ta chọn mô tả bài toán bằng việc tung
đồng xu ngoài việc nó giúp quá trình tính toán ngắn hơn. Thật vậy, nếu
lấy bất kỳ tập các biến ngẫu nhiên độc lập có cùng phân phối :math:`X_i`
nào, và gọi:

.. math::


   X^{(N)} = \sum_{i=1}^N X_i.

.. raw:: html

   <!--
   Then
   -->

Thì

.. math::


   \frac{X^{(N)} - \mu_{X^{(N)}}}{\sigma_{X^{(N)}}}

.. raw:: html

   <!--
   will be approximately Gaussian.
   There are additional requirements needed to make it work, most commonly $E[X^4] < \infty$, but the philosophy is clear.
   -->

sẽ xấp xỉ phân phối Gauss. Ta sẽ cần thêm vài điều kiện bổ sung, phổ
biến nhất là :math:`E[X^4] < \infty`, nhưng ý tưởng cốt lõi đã rõ ràng.

.. raw:: html

   <!--
   The central limit theorem is the reason that the Gaussian is fundamental to probability, statistics, and machine learning.
   Whenever we can say that something we measured is a sum of many small independent contributions, we can assume that the thing being measured will be close to Gaussian.  
   -->

Định lý giới hạn trung tâm là lý do mà phân phối Gauss là nền tảng của
xác suất, thống kê, và học máy. Mỗi khi ta có thể nói rằng thứ gì đó ta
đo được là tổng của nhiều phần nhỏ độc lập, ta có thể giả sử rằng thứ
được đo sẽ gần với phân phối Gauss.

.. raw:: html

   <!--
   There are many more fascinating properties of Gaussians, and we would like to discuss one more here.
   The Gaussian is what is known as a *maximum entropy distribution*.
   We will get into entropy more deeply in :numref:`sec_information_theory`, however all we need to know at this point is that it is a measure of randomness.
   In a rigorous mathematical sense, we can think of the Gaussian as the *most* random choice of random variable with fixed mean and variance.
   Thus, if we know that our random variable has some mean and variance, the Gaussian is in a sense the most conservative choice of distribution we can make.
   -->

Có rất nhiều tính chất hấp dẫn khác của phân phối Gauss, và chúng tôi
muốn thảo luận thêm một tính chất nữa ở đây. Phân phối Gauss được biết
tới là *phân phối entropy cực đại*. Ta sẽ phân tích entropy sâu hơn
trong :numref:`sec_information_theory`, tuy nhiên lúc này chỉ cần biết
nó là một phép đo sự ngẫu nhiên. Theo nghĩa toán học một cách chặt chẽ,
ta có thể hiểu phân phối Gauss là cách chọn ngẫu nhiên *nhất* với kỳ
vọng và phương sai cố định. Do đó, nếu ta biết biến ngẫu nhiên có kỳ
vọng và phương sai nào đó, về trực giác phân phối Gauss là lựa chọn an
toàn nhất trong những phân phối mà ta có thể chọn.

.. raw:: html

   <!--
   To close the section, Let us recall that if $X \sim \mathcal{N}(\mu, \sigma^2)$, then:
   -->

Để kết lại phần này, hãy nhớ lại rằng nếu
:math:`X \sim \mathcal{N}(\mu, \sigma^2)`, thì:

-  :math:`\mu_X = \mu`,
-  :math:`\sigma_X^2 = \sigma^2`.

.. raw:: html

   <!--
   We can sample from the Gaussian (or standard normal) distribution as shown below.
   -->

Ta có thể lấy mẫu từ phân phối Gauss (chuẩn tắc) như mô tả dưới.

.. code:: python

    np.random.normal(mu, sigma, size=(10, 10))


.. parsed-literal::
    :class: output

    array([[-4.70073572e-01,  1.53740330e-02, -1.25708405e+00,
             3.93032822e-01,  1.26170220e+00, -1.07841950e-01,
             8.14249971e-01,  5.33059547e-01, -5.37575777e-01,
            -1.39088319e+00],
           [-2.13897397e+00,  1.85436907e-01, -4.67020502e-01,
             3.95948161e-01,  3.82780791e-01,  8.52570730e-02,
             7.62757679e-01,  1.50908643e-01,  1.51440474e+00,
             1.23058716e+00],
           [ 5.88099577e-02,  2.59987625e+00,  8.25207214e-01,
             5.79111086e-01, -8.75180632e-01, -3.98806916e-03,
            -6.46533260e-01, -5.62996271e-01,  1.67630658e-01,
            -2.86060108e-01],
           [-2.16287890e+00, -1.74374740e+00,  4.93586519e-01,
             1.58769300e+00,  2.34537913e+00, -8.92978615e-01,
             8.21263763e-01,  7.55799486e-01, -1.63660331e-01,
             1.20444663e+00],
           [ 1.29949143e+00,  1.82309481e+00,  1.92677610e+00,
            -1.14405268e+00, -2.15467805e+00,  8.01106346e-01,
            -1.54416553e+00, -1.27111963e+00,  3.51530694e-01,
             1.03928776e+00],
           [-1.02241933e+00,  1.59877436e+00, -7.85346861e-01,
            -5.47695717e-01,  1.63452678e-03, -7.00418413e-01,
            -9.20152917e-02,  1.96000701e-01,  3.48288675e-01,
             9.79944049e-01],
           [ 1.22130815e+00,  1.80495024e-01,  2.27069952e+00,
             7.72245447e-01, -1.23556208e+00, -9.92150936e-02,
             3.10177852e-01, -2.60076054e-01, -1.36973192e-01,
             8.99212867e-01],
           [-9.05487820e-01,  4.96714082e-01, -7.98568789e-01,
            -5.70583495e-01,  1.04523284e+00,  6.79362108e-01,
            -4.44150812e-03,  1.12181808e+00,  1.14770056e+00,
            -8.75516294e-01],
           [-5.39966074e-01,  1.67431358e+00, -8.78798025e-01,
            -5.38699655e-01,  8.00758385e-01, -1.01580356e+00,
            -1.30615795e-01,  2.91012408e+00,  1.69557646e+00,
             2.14090220e-01],
           [ 6.29097502e-01, -1.17999245e-01,  1.59981036e-01,
             2.02849683e+00,  6.17987179e-02, -4.04266375e-01,
             1.39027385e+00, -8.50843221e-01,  1.68852330e-01,
             2.61518476e-01]])


.. raw:: html

   <!--
   ## Exponential Family
   -->

.. _subsec_exponential_family:

Họ hàm Mũ
---------


.. raw:: html

   <!--
   One shared property for all the distributions listed above is that they all belong to which is known as the *exponential family*.
   The exponential family is a set of distributions whose density can be expressed in the following form:
   -->

Một tính chất chung của tất cả các phân phối liệt kê ở trên là chúng đều
thuộc họ được gọi là *họ hàm mũ (exponential family)*. Họ hàm mũ là tập
các phân phối có mật độ được biểu diễn dưới dạng sau:

.. math:: p(\mathbf{x} | \mathbf{\eta}) = h(\mathbf{x}) \cdot \mathrm{exp} \big{(} \eta^{\top} \cdot T\mathbf(x) - A(\mathbf{\eta}) \big{)}
   :label: eq_exp_pdf

.. raw:: html

   <!--
   As this definition can be a little subtle, let us examine it closely.  
   -->

Định nghĩa này có vài điểm khá tinh tế nên hãy cùng xem xét kĩ lưỡng
hơn.

.. raw:: html

   <!--
   First, $h(\mathbf{x})$ is known as the *underlying measure* or the *base measure*.
   This can be viewed as an original choice of measure we are modifying with our exponential weight.  
   -->

Đầu tiên, :math:`h(\mathbf{x})` được gọi là *phép đo cơ bản (underlying
measure)* hay *phép đo cơ sở (base measure)*. Đây có thể được coi là
thang đo ban đầu mà chúng ta đang biến đổi khi điều chỉnh trọng số mũ.

.. raw:: html

   <!--
   Second, we have the vector $\mathbf{\eta} = (\eta_1, \eta_2, ..., \eta_l) \in \mathbb{R}^l$ called the *natural parameters* or *canonical parameters*.
   These define how the base measure will be modified.
   The natural parameters enter into the new measure by taking the dot product of these parameters against some function 
   $T(\cdot)$ of $\mathbf{x}= (x_1, x_2, ..., x_n) \in \mathbb{R}^n$ and exponentiated.
   $T(\mathbf{x})= (T_1(\mathbf{x}), T_2(\mathbf{x}), ..., T_l(\mathbf{x}))$ is called the *sufficient statistics* for $\eta$.
   This name is used since the information represented by $T(\mathbf{x})$ is sufficient to calculate the 
   probability density and no other information from the sample $\mathbf{x}$'s are required.
   -->

Thứ hai, ta có vector
:math:`\mathbf{\eta} = (\eta_1, \eta_2, ..., \eta_l) \in \mathbb{R}^l`
được gọi là *tham số tự nhiên (natural parameters)* hay *tham số chính
tắc (canonical parameters)*. Các vector này xác định phép đo cơ sở sẽ
được điều chỉnh thế nào. Ta tiến hành phép đo mới bằng cách tính tích vô
hướng của các tham số tự nhiên với hàm :math:`T(\cdot)` nào đó của
:math:`\mathbf{x}= (x_1, x_2, ..., x_n) \in \mathbb{R}^n` và lấy luỹ
thừa.
:math:`T(\mathbf{x})= (T_1(\mathbf{x}), T_2(\mathbf{x}), ..., T_l(\mathbf{x}))`
được gọi là *thống kê đầy đủ (sufficient statistics)* của :math:`\eta`,
do thông tin biểu diễn bởi :math:`T(\mathbf{x})` là đủ để tính mật độ
xác suất và không cần thêm bất cứ thông tin nào khác từ mẫu
:math:`\mathbf{x}`.

.. raw:: html

   <!--
   Third, we have $A(\mathbf{\eta})$, which is referred to as the *cumulant function*,
   which ensures that the above distribution :eqref:`eq_exp_pdf` integrates to one, i.e.,
   -->

Thứ ba, ta có :math:`A(\mathbf{\eta})`, được gọi là *hàm tích luỹ
(cumulant function)*, hàm này đảm bảo phân phối trên
:eq:`eq_exp_pdf` có tích phân bằng 1, và có dạng:

.. math::

     A(\mathbf{\eta}) = \log \left[\int h(\mathbf{x}) \cdot \mathrm{exp} 
   \big{(}\eta^{\top} \cdot T\mathbf(x) \big{)} dx \right].

.. raw:: html

   <!--
   To be concrete, let us consider the Gaussian.
   Assuming that $\mathbf{x}$ is an univariate variable, we saw that it had a density of
   -->

Để ngắn gọn, ta xét phân phối Gauss. Giả sử :math:`\mathbf{x}` là đơn
biến (*univariate variable*) và có mật độ là:

.. math::


   \begin{aligned}
   p(x | \mu, \sigma) &= \frac{1}{\sqrt{2 \pi \sigma^2}} \mathrm{exp} 
   \Big{\{} \frac{-(x-\mu)^2}{2 \sigma^2} \Big{\}} \\
   &= \frac{1}{\sqrt{2 \pi}} \cdot \mathrm{exp} \Big{\{} \frac{\mu}{\sigma^2}x 
   - \frac{1}{2 \sigma^2} x^2 - \big{(} \frac{1}{2 \sigma^2} \mu^2 
   + \log(\sigma) \big{)} \Big{\}} .
   \end{aligned}

.. raw:: html

   <!--
   This matches the definition of the exponential family with:
   -->

Hàm này phù hợp với định nghĩa của họ hàm mũ với:

.. raw:: html

   <!--
   * *underlying measure*: $h(x) = \frac{1}{\sqrt{2 \pi}}$,
   * *natural parameters*: $\eta = \begin{bmatrix} \eta_1 \\ \eta_2 \end{bmatrix} = \begin{bmatrix} \frac{\mu}{\sigma^2} \\ \frac{1}{2 \sigma^2}  \end{bmatrix}$,
   * *sufficient statistics*: $T(x) = \begin{bmatrix}x\\-x^2\end{bmatrix}$, and
   * *cumulant function*: $A(\eta) = \frac{1}{2 \sigma^2} \mu^2 + \log(\sigma) = \frac{\eta_1^2}{4 \eta_2} - \frac{1}{2}\log(2 \eta_2)$.
   -->

-  *phép đo cơ sở*: :math:`h(x) = \frac{1}{\sqrt{2 \pi}}`,
-  *tham số tự nhiên*:
   :math:`\eta = \begin{bmatrix} \eta_1 \\ \eta_2 \end{bmatrix} = \begin{bmatrix} \frac{\mu}{\sigma^2} \\ \frac{1}{2 \sigma^2} \end{bmatrix}`,
-  *thống kê đầy đủ*:
   :math:`T(x) = \begin{bmatrix}x\\-x^2\end{bmatrix}`, và
-  *hàm tích luỹ*:
   :math:`A(\eta) = \frac{1}{2 \sigma^2} \mu^2 + \log(\sigma) = \frac{\eta_1^2}{4 \eta_2} - \frac{1}{2}\log(2 \eta_2)`.

.. raw:: html

   <!--
   It is worth noting that the exact choice of each of above terms is somewhat arbitrary.
   Indeed, the important feature is that the distribution can be expressed in this form, not the exact form itself.
   -->

Đáng chú ý rằng việc lựa chọn chính xác từng số hạng trên hơi có phần
tuỳ ý. Quả thật, đặc trưng quan trọng nhất chính là việc phân phối có
thể được biểu diễn ở dạng này, chứ không cần bất kỳ dạng chính xác nào.

.. raw:: html

   <!--
   As we allude to in :numref:`subsec_softmax_and_derivatives`, a widely used technique is to assume that the final output $\mathbf{y}$ follows an exponential family distribution.
   The exponential family is a common and powerful family of distributions encountered frequently in machine learning.
   -->

Như đề cập trong :numref:`subsec_softmax_and_derivatives`, một kỹ
thuật hay dùng là giả sử kết quả cuối cùng :math:`\mathbf{y}` tuân theo
họ phân phối mũ. Họ hàm mũ là một họ phân phối phổ biến và mạnh mẽ, bắt
gặp thường xuyên trong học máy.

Tóm tắt
-------

.. raw:: html

   <!--
   * Bernoulli random variables can be used to model events with a yes/no outcome.
   * Discrete uniform distributions model selects from a finite set of possibilities.
   * Continuous uniform distributions select from an interval.
   * Binomial distributions model a series of Bernoulli random variables, and count the number of successes.
   * Poisson random variables model the arrival of rare events.
   * Gaussian random variables model the result of adding a large number of independent random variables together.
   * All the above distributions belong to exponential family.
   -->

-  Phân phối Bernoulli có thể mô hình hóa sự kiện có kết quả có/không.
-  Phân phối đều rời rạc chọn từ một tập hữu hạn các khả năng.
-  Phân phối đều liên tục chọn từ một khoảng liên tục.
-  Phân phối nhị thức mô hình hóa một chuỗi các biến Bernoulli ngẫu
   nhiên, và đếm số kết quả.
-  Phân phối Poisson mô hình hóa các sự kiện hiếm khi xuất hiện.
-  Phân phối Gauss mô hình hóa kết quả của việc tính tổng một lượng lớn
   các biến ngẫu nhiên độc lập.
-  Tất cả các phân phối trên đều thuộc họ hàm mũ.

Bài tập
-------

.. raw:: html

   <!--
   1. What is the standard deviation of a random variable that is the difference $X-Y$ of two independent binomial random variables $X, Y \sim \mathrm{Binomial}(16, 1/2)$.
   2. If we take a Poisson random variable $X \sim \mathrm{Poisson}(\lambda)$ and consider $(X - \lambda)/\sqrt{\lambda}$ as $\lambda \rightarrow \infty$, 
   we can show that this becomes approximately Gaussian. Why does this make sense?
   3. What is the probability mass function for a sum of two discrete uniform random variables on $n$ elements?
   -->

1. Tính độ lệch chuẩn của một biến ngẫu nhiên mô tả hiệu :math:`X-Y` của
   hai biến ngẫu nhiên nhị thức độc lập
   :math:`X, Y \sim \mathrm{Binomial}(16, 1/2)`.
2. Nếu ta lấy một biến ngẫu nhiên Poisson
   :math:`X \sim \mathrm{Poisson}(\lambda)` và xét
   :math:`(X - \lambda)/\sqrt{\lambda}` với
   :math:`\lambda \rightarrow \infty`, ta có thể chỉ ra rằng phân phối
   này xấp xỉ phân phối Gauss. Tại sao điều này lại hợp lý?
3. Hàm khối xác suất của tổng của hai biến ngẫu nhiên rời rạc theo phân
   phối đều trên :math:`n` phần tử là gì?

Thảo luận
---------

-  Tiếng Anh: `MXNet <https://discuss.d2l.ai/t/417>`__
-  Tiếng Việt: `Diễn đàn Machine Learning Cơ
   Bản <https://forum.machinelearningcoban.com/c/d2l>`__

Những người thực hiện
---------------------

Bản dịch trong trang này được thực hiện bởi:

-  Đoàn Võ Duy Thanh
-  Nguyễn Mai Hoàng Long
-  Lê Khắc Hồng Phúc
-  Phạm Minh Đức
-  Phạm Hồng Vinh
-  Đỗ Trường Giang
-  Nguyễn Văn Cường