2020-04-30

Mathematical Learning HW1

文章目录

1. Question 1
2. Question 2
3. Question 3
4. Question 4

Homework for Course MSBD5004 Mathematical Methods for Data Analysis, BDT, HKUST

Question 1

Consider the vector space $R^{n}$

(a)

Check that $|x|_{\infty}=\max_{1 \leq i \leq n} |x_{1}|$ is indeed a norm on $\mathbb{R}^n$

According to the definition of norm , we check

(1)
$\begin{aligned} | x|_{\infty} =\max _{1 \leq i \leq n}\left|x_{i}\right| \geq 0, \forall x \in \mathbb{R}^{n}\end{aligned} $

and $| x|_{\infty} = 0 \Leftrightarrow |x_i|=0 \Leftrightarrow x=\overrightarrow{0}$

(2)
$\begin{aligned} |\alpha x|_{\infty}=\max _{1 \leq i \leq n}|a x_{i}|=|\alpha| \cdot \max _{1 \leq i \leq n}\left|x_{i}\right|=|\alpha| \cdot | x|_{\infty}, \forall x \in \mathbb{R}, \alpha \in \mathbb{R}\end{aligned}$

(3)
$\begin{aligned}|x+y|_{\infty}=\max _{1 \leq i \leq n}\left|x_{i}+y_{i}\right| =\max_{1 \leq i \leq n} \left| x_i \right| +\max_{1 \leq i \leq n}\left| y_{i}\right|=|x|_{\infty}+|y|_{\infty}, \quad \forall x, y \in \mathbb{R}^{n}\end{aligned}$

Therefore, $|x|_{\infty}=\max _{1 \leq i \leq n}\left|x_{i}\right|$ is a norm on $\mathbb{R}^{n}$

(b)

Prove that: for any $x \in \mathbb{R}^{n}, \quad|x|_{\infty}= lim_{p \rightarrow \infty} |x|_p$

Prove:

$\begin{aligned} &\|x\|_{p}=\left(\sum_{i=1}^{n}\left|x_{i}\right|^{p}\right)^{1 / p} \\ &=\left(\left|x_{1}\right|^{p}+\left|x_{x}\right|^{p}+\cdots+\left|x_{n}\right|^{p}\right)^{1 / p} \end{aligned}$

Let

$\left|x_{m}\right|=\max \left\{\left|x_{1}\right|,\left|x_{2}\right|, \cdots,\left|x_{n}\right|\right\}$

Because

$\frac{\left|x_{1} \right|}{\left|x_{m} \right|}, \frac{\left|x_{2} \right|}{\left|x_{m}\right|}, \ldots, \frac{\left|x_{n}\right|}{\left|x_{m}\right|}<1$

Then we can get this inequality

$1 \cdot |x_m| \leq \|x\|_{p}=\left[\left(\frac{x_1}{x_{m}}\right)^{p}+\left(\frac{x_2}{x_{m}}\right)^{p}+ \cdots + \left(\frac{x_n}{x_{m}}\right)^{p} \right]^{\frac{1}{p}} \cdot |x_m| \leq (n)^{\frac{1}{p}} \cdot |x_m|$

when $p \rightarrow \infty, (n)^{\frac{1}{p}} \rightarrow 1$ so $| x |_p \rightarrow |x_m|$

Finally
$\lim _{p \rightarrow \infty}|x|_{p}=\left|x_{m}\right|=\max _{1 \leq i \leq n}\left|x_{i}\right| = |x|_\infty$. Hence proved.

(c)

Prove the equivalence

$\|x\|_{\infty} \leqslant \| x\left\|_{1} \leqslant n\right\|x \|_{\infty}, \forall x \in \mathbb{R}^{n}$

Prove: According to the definition

Obviously, $\max_{1 \leq i \leq n} \left|x_{i}\right|$ is taken from$\left\{\left|x_{i}\right|,\left|x_{2}\right|, \cdots,\left|x_{n}\right|\right\}$

So, $| x |_{1}=\sum_{i=1}^{n}\left|x_{1}\right| \geqslant \max _{1 \leq i \leq n}\left|x_{i}\right|= |x|_\infty$

Then

$\left|x_{1}\right| \leq \max_{1 \leq i \leq n} |x_i|\\ \left|x_{2}\right| \leq \max_{1 \leq i \leq n} |x_i|\\ \vdots \\ \left|x_{n}\right| \leq \max_{1 \leq i \leq n}|x_i|$

Therefore $\sum_{i=1}^{n}\left|x_{i}\right|=n \cdot \max_{1 \leq i \leq n} |x_i|=n \cdot|x|_{\infty}$

Altogether, $|x|_{\infty} \leqslant|x|_{1} \leqslant n \cdot|x|_{\infty}$

Question 2

For any $A \in \mathbb{R}^{m \times n}$, we have defined

$\|A\|_{2}=\sup _{x \in \mathbb{R}^{n}, x \neq 0} \quad \frac{\|A x\|_{2}}{\|x\|_{2}}$

(a)

Prove that

$\|A\|_{2}=\max _{x \in \mathbb{R}^n, \|x\|_2=1}\|A x\|_{2}$

Prove :

$\begin{aligned} \|A\|_2 &= \sup _{x \in \mathbb{R}^{n}, x \neq 0}\frac{\|A x\|_{2}}{\|x\|_{2}}\\ &= \sup _{x \in \mathbb{R}^{n}, x \neq 0}\left\|\frac{A x}{\|x\|_{2}}\right\|_{2} \\ &= \sup _{x \in \mathbb{R}^{n}, x \neq 0}\left\|A \cdot \frac{x}{\|x\|_{2}}\right\|_{2} \end{aligned}$

Note that $\frac{x}{|x|_2}$ has unit length. So we can express

$\|A\|_2=\sup _{x \in \mathbb{R}^{n},\|x\|_{2}=1} \|A x \|_{2}$

However, $|A|_2$ is a continuous function of $x$ and the unit ball $|x|_{2}=1$ is closed and bounded. Now, on a closed and bounded set, a continuous function always achieves its maximum and minimum values.

Hence, the ‘sup’ can be replaced by ‘max’. i.e.

$\|A\|_{2}=\max _{x \in R^{n},\|x\|_{2}=1}\|A x\|_{2}$

(b)

Prove that $|\cdot|_2$ is a norm on $\mathbb{R}^{m \times n}$

Prove: According to the definition of norm we check

(1) $\forall A \in \mathbb{R}^{m \times n}$

$\begin{aligned} \|A\|_{2}=\max _{x \in \mathbb{R}^n, \|x\|_2=1}\|A x\|_{2} \geqslant 0 \end{aligned}$

when $\begin{aligned}\max _{x \in \mathbb{R}^n, |x|_2=1}|A x|_{2} = 0\end{aligned}$ with $|A x|_{2} \geq 0$, we have

$\|A x\|_{2} = 0, s.t. \|x\|_2=1, x \in \mathbb{R}^n \Rightarrow A=O$

Thus, $|A x|_{2} = 0 \Leftrightarrow A=O$

(2) $\forall \alpha \in \mathbb{R}, A \in \mathbb{R}^{m \times n}$

$\begin{aligned} \|\alpha A\|_{2} &= \max _{x \in \mathbb{R}^n, \|x\|_2=1}\|\alpha A x\|_{2} \\ &= |\alpha| \cdot \max _{x \in \mathbb{R}^n, \|x\|_2=1} \|A x\|_{2}\\ &= |\alpha| \cdot \|A\|_2 \end{aligned}$

(3) $\forall A,B \in \mathbb{R}^{m \times n}$

$\begin{aligned} \|A + B\|_{2} &= \max _{x \in \mathbb{R}^n, \|x\|_2=1}\|(A+B)x\|_{2} \\ &= \max _{x \in \mathbb{R}^n, \|x\|_2=1}\|Ax+Bx\|_{2} \\ &\leq \max _{x \in \mathbb{R}^n, \|x\|_2=1} \left(\|Ax\|_{2} + \|Bx\|_{2}\right) \\ &= \max _{x \in \mathbb{R}^n, \|x\|_2=1}\|Ax\|_{2} + \max _{x \in \mathbb{R}^n, \|x\|_2=1}\|Bx\|_{2}\\ &= \|A\|_2 + \|B\|_2 \end{aligned}$

Altogether, $|\cdot|_{2}$ is a norm on $\mathbb{R}^{m \times n}$

(c)

Prove that $|A x|_{2} \leqslant|A|_{2}|x|_{2}$ for any $A \in \mathbb{R}^{n \times n}$ and $x \in \mathbb{R}^{n}$

Prove: Consider the definition of matrix 2-norm, $\forall A \in \mathbb{R}^{m \times n}$

$\|A\|_{2}=\sup _{x \in \mathbb{R}^{n}, x \neq 0} \quad \frac{\|A x\|_{2}}{\|x\|_{2}}$

It reveals that

$\|A x\|_{2} \leqslant \|A\|_2 \cdot \|x\|_{2}$

Hence proved

(d)

Prove that $|A B|_{2} \leqslant|A|_{2} \cdot|B|_{2}$ for all $A \in \mathbb{R}^{m \times n}$ and $B \in \mathbb{R}^{n \times p}$

Prove: In Question (c) we have proved $|A x|_{2} \leqslant|A|_{2}|x|_{2}$, $A \in \mathbb{R}^{n \times n}$, $x \in \mathbb{R}^{n}$

use this inequality twice. Let $B \in \mathbb{R}^{n \times p}, y \in \mathbb{R}^p$

$\begin{aligned} \|A B y\|_{2} &=\|A(B y)\|_{2} \\ &\leqslant\|A\|_{2} \cdot\|B y\|_{2} \\ &\leqslant\|A\|_{2} \cdot \|B\|_{2} \cdot\|y\|_{2} \end{aligned}$

Then for $y \neq 0$

$\begin{array}{l} \frac{\|A B y\|_{2}}{\|y\|_{2}} \leqslant\|A\|_{2} \cdot \|B\|_2 \\ \left\|A B \cdot \frac{y}{\|y\|_{2}}\right\|_{2} \leqslant\|A\|_{2} \cdot\|B\|_{2} \end{array}$

Hence, $|A B|_{2} \leqslant|A|_{2} \cdot | B |_{2},$ proved

Question 3

Let $a_{1}, a_{2}, \cdots, a_{m}$ be the $m$ given real numbers. Prove that a median of $a_{1}, a_{2}, \dots, a_{m}$ minimizes

$|a_{1} - b|+\left|a_{2}-b\right|+\cdots+\left|a_{m}-b\right|$

over all $b \in R$

Prove: To prove a median can minimize $\left|a_{1}-b\right|+\left|a_{2}-b\right|+\dots+\left|a_{m}-b\right|$ is equivalent to prove a median minimizes

$\text { Let } f(b)=\left(a_{1}-b\right)^{2}+\left(a_{2}-b\right)^{2}+\cdots+\left(a_{n}-b\right)^{2}$ $\begin{aligned} f^{\prime}(b)&=-2\left[\left(a_{1}-b\right)+\left(a_{2}-b\right)+\cdots+\left(a_{m-b}\right)\right] \\ &=-2\left[\left(a_{1}+a_{2}+\dots+a_{m}\right)-m b\right] \end{aligned}$

Make $f^{\prime}(b)=0, \text { get } b=\frac{1}{m}\left(a_{1}+a_{2}+\dots+a_{m}\right)$

Due to $f^{\prime \prime}(t)=2 m>0$

Therefore when $b=\frac{1}{m}\left(a_{1}+a_{2}+\dots+a_{m}\right), f(b)$ achieves its minimum
i.e. the median minimizes $\left|a_{1}-b\right|+\left|a_{2}-b\right|+\dots+\left|a_{m}-b\right|$

Question 4

Suppose that the vectors $x_{1}, \ldots . x_{N}$ in $\mathbb{R}^{n}$ are clustered using the K-means algorithm, with group representative $z_1, \cdots, z_{k}$

(a) Suppose the original vectors $x_i$ are nonnegative. i.e. their entries are nonnegative. Explain why the representatives $z_j$ output by the k-means algorithm are also nonnegtive.

Answer: we see the algorithm step 2

Fix group $G_{1}, G_{2}, \cdots, G_{k}$ and solve

$\min _{z_1, \cdots, z_k} \sum_{j=1}^{k}\left(\sum_{i \in G_j}\left\|x_{i}-z_{j}\right\|_{2}^{2}\right)$ $\text { that } \quad z_{j}=\frac{1}{\left|G_{j}\right|}\left(\sum_{i \in G_j} x_{i}\right)$

Because $x_i$ are nonnegative, according to the equation, $z_j$ is also nonnegative.

(b) Suppse the original vector $x_i$ represent proportions, i.e. their entries are nonnegative and sum to one. Explain why the representatives
$z_j$ output by k-means algorithms are also represent proportions.

Answer: Still consider the equation above, rewrite it to vector form

$z_{i}=\frac{1}{\left|G_{i}\right|} \sum_{i \in G_{j}}\left(\begin{array}{l} x_{i1} \\ x_{i2} \\ \vdots \\ x_{in} \end{array}\right)$ $=\frac{1}{\left|G_{i}\right|} \left(\begin{array}{c} \sum_{i \in G_{j}} x_{i1} \\ \sum_{i \in G_{j}} x_{i2} \\ \vdots\\ \sum_{i \in G_{j}} x_{in} \end{array}\right)$

For $x_{i}+x_{i 2}+\cdots+x_{i n}=1, \quad \sum_{i \in G_{j}} x_{i1}+\sum_{i \in G_{j}} x_{i2}+\cdots+ \sum_{i \in G_{j}} x_{in} = |G_j|$

We can get

$\frac{1}{\left|G_{i}\right|} \left(\sum_{i \in G_{j}} x_{i1}+\sum_{i \in G_{j}} x_{i2}+\cdots+ \sum_{i \in G_{j}} x_{in}\right)=1$

and $z_j$ nonnegstive is obvious. Hence proved.

(c) Suppose the original vector $x_i$ are Boolean. i.e. their entries are either 0 or 1. Give an interpretation of $(z_j)_i$, the i-th entry of the $j$ group representative.

Answer:

If $(z_j)_i = 0$, it means in group $j$, the number of vectors with i-th entry ZERO is more than those with i-th entry ONE
If $(z_j)_i = 1$ it means in group $j$, the number of vectors with i-th entry ONE is more that those with i-th entry ZERO