{"id":792,"date":"2022-05-22T13:08:13","date_gmt":"2022-05-22T11:08:13","guid":{"rendered":"https:\/\/lorentzen.ch\/?p=792"},"modified":"2022-05-22T13:08:13","modified_gmt":"2022-05-22T11:08:13","slug":"from-least-squares-benchmarks-to-the-marchenko-pastur-distribution","status":"publish","type":"post","link":"https:\/\/lorentzen.ch\/index.php\/2022\/05\/22\/from-least-squares-benchmarks-to-the-marchenko-pastur-distribution\/","title":{"rendered":"From Least Squares Benchmarks to the Marchenko\u2013Pastur Distribution"},"content":{"rendered":"\n<p>In this blog post, I tell the story how I learned about a theorem for random matrices of the two Ukrainian\ud83c\uddfa\ud83c\udde6 mathematicians Vladimir Marchenko and Leonid Pastur. It all started with benchmarking least squares solvers in scipy.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Setting the Stage for Least Squares Solvers<\/h2>\n\n\n\n<p>Least squares starts with a matrix <code><span class=\"katex-eq\" data-katex-display=\"false\">A \\in \\mathbb{R}^{n,p}<\/span><\/code> and a vector <code><span class=\"katex-eq\" data-katex-display=\"false\">b \\in \\mathbb{R}^{n}<\/span><\/code> and one is interested in solution vectors <code><span class=\"katex-eq\" data-katex-display=\"false\">x \\in \\mathbb{R}^{p}<\/span><\/code> fulfilling<\/p>\n\n\n\n<div class=\"wp-block-katex-display-block katex-eq\" data-katex-display=\"true\"><pre>x^\\star = \\argmin_x ||Ax-b||_2^2 \\,.<\/pre><\/div>\n\n\n\n<p>You can read more about least squares in our earlier post <a href=\"https:\/\/lorentzen.ch\/index.php\/2021\/05\/02\/least-squares-minimal-norm-solution\/\" data-type=\"post\" data-id=\"355\">Least Squares Minimal Norm Solution<\/a>. There are many possible ways to tackle this problem and many algorithms are available. One standard solver is <a href=\"https:\/\/web.stanford.edu\/group\/SOL\/software\/lsqr\/\">LSQR<\/a> with the following properties:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Iterative solver, which terminates when some stopping criteria are smaller than a user specified tolerance.<\/li><li>It only uses matrix-vector products. Therefore, it is suitable for large and sparse matrices <span class=\"katex-eq\" data-katex-display=\"false\">A<\/span>.<\/li><li>It effectively solves the normal equations <span class=\"katex-eq\" data-katex-display=\"false\">A^T A = A^Tb<\/span> based on a bidiagonalization procedure of Golub and Kahan (so never really calculating <span class=\"katex-eq\" data-katex-display=\"false\">A^T A<\/span>).<\/li><li>It is, unfortunately, susceptible to ill-conditioned matrices <span class=\"katex-eq\" data-katex-display=\"false\">A<\/span>, which we demonstrated in our earlier post.<\/li><\/ul>\n\n\n\n<p>Wait, what is an ill-conditioned matrix? This is most easily explained with the help of the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Singular_value_decomposition\">singular value decomposition<\/a> (SVD). Any real valued matrix permits a decomposition into three parts:<\/p>\n\n\n\n<div class=\"wp-block-katex-display-block katex-eq\" data-katex-display=\"true\"><pre>A = U S V^T \\,.<\/pre><\/div>\n\n\n\n<p><span class=\"katex-eq\" data-katex-display=\"false\">U<\/span> and <span class=\"katex-eq\" data-katex-display=\"false\">V<\/span> are orthogonal matrices, but not of further interest to us. The matrix <span class=\"katex-eq\" data-katex-display=\"false\">S<\/span> on the other side is (rectangular) diagonal with only non-negative entries <span class=\"katex-eq\" data-katex-display=\"false\">s_i = S_{ii} \\geq 0<\/span>. A matrix <span class=\"katex-eq\" data-katex-display=\"false\">A<\/span> is said to be ill-conditioned if it has a large condition number, which can be defined as the ratio of largest and smallest singular value, <span class=\"katex-eq\" data-katex-display=\"false\">\\mathrm{cond}(A) = \\frac{\\max_i s_i}{\\min_i s_i} = \\frac{s_{\\mathrm{max}}}{s_{\\mathrm{min}}}<\/span>. Very often, large condition numbers make numerical computations difficult or less precise.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Benchmarking LSQR<\/h2>\n\n\n\n<p>One day, I decided to benchmark the computation time of least squares solvers provided by <a href=\"https:\/\/scipy.org\/\">scipy<\/a>, in particular LSQR. I wanted results for different sizes <span class=\"katex-eq\" data-katex-display=\"false\">n, p<\/span> of the matrix dimensions. So I needed to somehow generate different matrices <span class=\"katex-eq\" data-katex-display=\"false\">A<\/span>. There are a myriad ways to do that. Naive as I was, I did the most simple thing and used standard Normal (Gaussian) distributed random matrices <span class=\"katex-eq\" data-katex-display=\"false\">A_{ij} \\sim \\mathcal{N}(0, 1)<\/span> and ran benchmarks on those. Let&#8217;s see how that looks in Python.<\/p>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;language&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}\">from collections import OrderedDict\nfrom functools import partial\n\nimport matplotlib.pyplot as plt\nimport numpy as np\nfrom scipy.linalg import lstsq\nfrom scipy.sparse.linalg import lsqr\nimport seaborn as sns\nfrom neurtu import Benchmark, delayed\n\nplt.ion()\n\np_list = [100, 500]\nrng = np.random.default_rng(42)\nX = rng.standard_normal(max(p_list) ** 2 * 2)\ny = rng.standard_normal(max(p_list) * 2)\n\ndef bench_cases():    \n    for p in p_list:\n        for n in np.arange(0.1, 2.1, 0.1):\n            n = int(p*n)\n            A = X[:n*p].reshape(n, p)\n            b = y[:n]\n            for solver in ['lstsq', 'lsqr']:\n                tags = OrderedDict(n=n, p=p, solver=solver)\n                if solver == 'lstsq':\n                    solve = delayed(lstsq, tags=tags)\n                elif solver == 'lsqr':\n                    solve = delayed(\n                        partial(\n                            lsqr, atol=1e-10, btol=1e-10, iter_lim=1e6\n                        ),\n                        tags=tags)\n\n                yield solve(A, b)\n\nbench = Benchmark(wall_time=True)\ndf = bench(bench_cases())\n\ng = sns.relplot(x='n', y='wall_time', hue='solver', col='p',\n            kind='line', facet_kws={'sharex': False, 'sharey': False},\n            data=df.reset_index(), marker='o')\ng.set_titles(&quot;p = {col_name}&quot;)\ng.set_axis_labels(&quot;n&quot;, &quot;Wall time (s)&quot;)\ng.set(xscale=&quot;linear&quot;, yscale=&quot;log&quot;)\n\nplt.subplots_adjust(top=0.9)\ng.fig.suptitle('Benchmark LSQR vs LSTSQ')\nfor ax in g.axes.flatten():\n    ax.tick_params(labelbottom=True)<\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"475\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/05\/fig_benchmark-1024x475.png\" alt=\"\" class=\"wp-image-882\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/05\/fig_benchmark-1024x475.png 1024w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/05\/fig_benchmark-300x139.png 300w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/05\/fig_benchmark-768x356.png 768w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/05\/fig_benchmark-1536x713.png 1536w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/05\/fig_benchmark-2048x950.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Timing benchmark of LSQR and standard LAPACK LSTSQ solvers for different sizes of n on the x-axis and p=100 left, p=500 right.<\/figcaption><\/figure>\n\n\n\n<p>The left plot already looks a bit suspicious around <span class=\"katex-eq\" data-katex-display=\"false\">n=p<\/span>. But what is happening on the right side? Where does this spike of LSQR come from? And why does the standard least squares solver, SVD-based lstsq, not show this spike?<\/p>\n\n\n\n<p>When I saw these results, I thought something might be wrong with LSQR and opened an issue on the scipy github repository, see <a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/scipy\/scipy\/issues\/11204\" target=\"_blank\"><\/a><a href=\"https:\/\/github.com\/scipy\/scipy\/issues\/11204\">https:\/\/github.com\/scipy\/scipy\/issues\/11204<\/a>. The community there is really fantastic. Brett Naul pointed me to &#8230;.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Marchenko\u2013Pastur Distribution<\/h2>\n\n\n\n<p>The <a href=\"https:\/\/en.wikipedia.org\/wiki\/Marchenko%E2%80%93Pastur_distribution\">Marchenko\u2013Pastur distribution<\/a> is the distribution of the eigenvalues (singular values of square matrices) of certain random matrices in the large sample limit. Given a random matrix <span class=\"katex-eq\" data-katex-display=\"false\">A \\in \\mathbb{R}^{n,p}<\/span> with i.i.d. entries <span class=\"katex-eq\" data-katex-display=\"false\">A_{ij}<\/span> having zero mean, <span class=\"katex-eq\" data-katex-display=\"false\">\\mathbb{E}[A_{ij}] = 0<\/span>, and finite variance, <span class=\"katex-eq\" data-katex-display=\"false\">\\mathrm{Var}[A_{ij}] = \\sigma^2 &lt; \\infty<\/span>, we define the matrix <span class=\"katex-eq\" data-katex-display=\"false\">Y_n = \\frac{1}{n}A^T A \\in \\mathbb{R}^{p,p}<\/span>. As square and even symmetric matrix, <span class=\"katex-eq\" data-katex-display=\"false\">Y_n<\/span> has a simpler SVD, namely <span class=\"katex-eq\" data-katex-display=\"false\">Y_n = V \\Sigma V^T<\/span>. One can in fact show that <span class=\"katex-eq\" data-katex-display=\"false\">V<\/span> is the same as in the SVD of <span class=\"katex-eq\" data-katex-display=\"false\">A<\/span> and that the diagonal matrix <span class=\"katex-eq\" data-katex-display=\"false\">\\Sigma = \\frac{1}{n}S^TS<\/span> contains the squared singular values of <span class=\"katex-eq\" data-katex-display=\"false\">A<\/span> and <span class=\"katex-eq\" data-katex-display=\"false\">\\min(0, p-n)<\/span> extra zero values. The (diagonal) values of <span class=\"katex-eq\" data-katex-display=\"false\">\\Sigma<\/span> are called eigenvalues <span class=\"katex-eq\" data-katex-display=\"false\">\\lambda_1, \\ldots, \\lambda_p<\/span> of <span class=\"katex-eq\" data-katex-display=\"false\">Y_n<\/span>. Note\/ that the eigenvalues are themselves random variables, and we are interested in their probability distribution or probability measure. We define the (random) measure <span class=\"katex-eq\" data-katex-display=\"false\">\\mu_p(B) = \\frac{1}{p} \\#\\{\\lambda_j \\in B\\} <\/span> for all intervals <span class=\"katex-eq\" data-katex-display=\"false\">B \\subset \\mathbb{R}<\/span>.<\/p>\n\n\n\n<p>The theorem of Marchenko and Pastur then states that <meta http-equiv=\"content-type\" content=\"text\/html; charset=utf-8\">for <span class=\"katex-eq\" data-katex-display=\"false\">n, p \\rightarrow \\infty<\/span> with <span class=\"katex-eq\" data-katex-display=\"false\">\\frac{p}{n} \\rightarrow \\rho <\/span>, we have <span class=\"katex-eq\" data-katex-display=\"false\">\\mu_p \\rightarrow \\mu<\/span>, where<\/p>\n\n\n\n<div class=\"wp-block-katex-display-block katex-eq\" data-katex-display=\"true\"><pre>\\mu(B) = \\begin{cases} (1-\\frac{1}{\\rho})\\mathbb{1}_{0\\in B} + \\nu(B),\\quad &amp;\\rho &gt; 1 \\\\ \\nu(B), \\quad &amp; 0\\leq\\rho\\leq 1 \\end{cases} \\,,<\/pre><\/div>\n\n\n\n<div class=\"wp-block-katex-display-block katex-eq\" data-katex-display=\"true\"><pre>\\nu(x) = \\frac{1}{2\\pi\\sigma^2} \\frac{\\sqrt{(\\rho_+ - x)(x - \\rho_-)}}{\\rho x} \\mathbb{1}_{x \\in [\\rho_-, \\rho_+]} dx\\,,<\/pre><\/div>\n\n\n\n<div class=\"wp-block-katex-display-block katex-eq\" data-katex-display=\"true\"><pre>\\rho_{\\pm} = \\sigma^2 (1\\pm\\sqrt{\\rho})^2 \\,.<\/pre><\/div>\n\n\n\n<p>We can at least derive the point mass at zero for <span class=\"katex-eq\" data-katex-display=\"false\">\\rho&gt;1 \\Leftrightarrow p&gt;n<\/span>: We said above that <span class=\"katex-eq\" data-katex-display=\"false\">\\Sigma<\/span> contains <span class=\"katex-eq\" data-katex-display=\"false\">p-n<\/span> extra zeros and those correspond to a density of <span class=\"katex-eq\" data-katex-display=\"false\">\\frac{p-n}{p}=1 - \\frac{1}{\\rho}<\/span> at zero.<\/p>\n\n\n\n<p>A lot of math so far. Just note that the assumptions on <span class=\"katex-eq\" data-katex-display=\"false\">A<\/span> are exactly met by the one in our benchmark above. Also note that the normal equations can be expressed in terms of <span class=\"katex-eq\" data-katex-display=\"false\">Y_n<\/span> as <span class=\"katex-eq\" data-katex-display=\"false\">n Y_n x = A^Tb<\/span>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Empirical Confirmation of the Marchenko\u2013Pastur Distribution<\/h2>\n\n\n\n<p>Before we come back to the spikes in our benchmark, let us have a look and see how good the Marchenko\u2013Pastur distribution is approximated for finite sample size. We choose n=1000, p=500 which gives <span class=\"katex-eq\" data-katex-display=\"false\">\\rho=\\frac{1}{2}<\/span>. We plot a histrogram of the eigenvalues next to the Marchenko\u2013Pastur distribution.<\/p>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;language&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}\">def marchenko_pastur_mu(x, rho, sigma2=1):\n    x = np.atleast_1d(x).astype(float)\n    rho_p = sigma2 * (1 + np.sqrt(rho)) ** 2\n    rho_m = sigma2 * (1 - np.sqrt(rho)) ** 2\n    mu = np.zeros_like(x)\n    is_nonzero = (rho_m &lt; x) &amp; (x &lt; rho_p)\n    x_valid = x[is_nonzero]\n    factor = 1 \/ (2 * np.pi * sigma2 * rho)\n    mu[is_nonzero] = factor \/ x_valid\n    mu[is_nonzero] *= np.sqrt((rho_p - x_valid) * (x_valid - rho_m))\n    if rho &gt; 1:\n        mu[x == 0] = 1 - 1 \/ rho\n    return mu\n\nfig, ax = plt.subplots()\n\nn, p = 1000, 500\nA = X.reshape(n, p)\nY = 1\/n * A.T @ A\neigenvals, _ = np.linalg.eig(Y)\nax.hist(eigenvals.real, bins=50, density=True, label=&quot;histogram&quot;)\nx = np.linspace(0, np.max(eigenvals.real), 100)\nax.plot(x, marchenko_pastur_mu(x, rho=p\/n), label=&quot;MP distribution&quot;)\nax.legend()\nax.set_xlabel(&quot;eigenvalue&quot;)\nax.set_ylabel(&quot;probability&quot;)\nax.set_title(&quot;Empirical evidence for n=1000, p=500, rho=0.5&quot;)<\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"768\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/05\/empirical_eigenvalue_distribution-1024x768.png\" alt=\"\" class=\"wp-image-884\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/05\/empirical_eigenvalue_distribution-1024x768.png 1024w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/05\/empirical_eigenvalue_distribution-300x225.png 300w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/05\/empirical_eigenvalue_distribution-768x576.png 768w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/05\/empirical_eigenvalue_distribution-1536x1152.png 1536w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/05\/empirical_eigenvalue_distribution.png 1920w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Hitogram of eigenvalues for n=100, p=500 versus Marchenko\u2013Pastur distribution<\/figcaption><\/figure>\n\n\n\n<p>I have to say, I am very impressed by this good agreement for n=1000, which is far from being huge.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>Let&#8217;s visualize the Marchenko\u2013Pastur distribution <span class=\"katex-eq\" data-katex-display=\"false\">Y_n<\/span> for several ratios <span class=\"katex-eq\" data-katex-display=\"false\">\\rho<\/span> and fix <span class=\"katex-eq\" data-katex-display=\"false\">\\sigma=1<\/span>:<\/p>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;language&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}\">fig, ax = plt.subplots()\n\nrho_list = [0.5, 1, 1.5]\nx = np.linspace(0, 5, 1000)[1:] # exclude 0\nfor rho in rho_list:\n    y = marchenko_pastur_mu(x, rho)\n    line, = ax.plot(x, y, label=f&quot;rho={rho}&quot;)\n    # plot zero point mass\n    if rho &gt; 1:\n        ax.scatter(0, marchenko_pastur_mu(0, rho), color = line.get_color())\n\nax.set_ylim(None, 1.2)\nax.legend()\nax.set_title(&quot;Marchenko-Pastur Distribution&quot;)\nax.set_xlabel(&quot;x&quot;)\nax.set_ylabel(&quot;dmu\/dx&quot;)<\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"768\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/05\/marchenko_pastur_distribution-1024x768.png\" alt=\"\" class=\"wp-image-885\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/05\/marchenko_pastur_distribution-1024x768.png 1024w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/05\/marchenko_pastur_distribution-300x225.png 300w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/05\/marchenko_pastur_distribution-768x576.png 768w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/05\/marchenko_pastur_distribution-1536x1152.png 1536w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/05\/marchenko_pastur_distribution.png 1920w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Marchenko-Pastur distribution for several ratios rho. The green dot is the point mass at zero.<\/figcaption><\/figure>\n\n\n\n<p>From this figure it becomes obvious that the closer the ratio <span class=\"katex-eq\" data-katex-display=\"false\">\\rho = 1<\/span>, the higher the probability for very tiny eigenvalues. This results in a high probability for an ill-conditioned matrix <span class=\"katex-eq\" data-katex-display=\"false\">A^TA<\/span> coming from an ill-conditioned matrix <span class=\"katex-eq\" data-katex-display=\"false\">A<\/span>. Let&#8217;s confirm that:<\/p>\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;showPanel&quot;:true,&quot;languageLabel&quot;:&quot;language&quot;,&quot;fullScreenButton&quot;:true,&quot;copyButton&quot;:true,&quot;mode&quot;:&quot;python&quot;,&quot;mime&quot;:&quot;text\/x-python&quot;,&quot;theme&quot;:&quot;material&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;fileName&quot;:&quot;&quot;,&quot;language&quot;:&quot;Python&quot;,&quot;maxHeight&quot;:&quot;400px&quot;,&quot;modeName&quot;:&quot;python&quot;}\">p = 500\nn_vec = []\nc_vec = []\nfor n in np.arange(0.1, 2.05, 0.05):\n    n = int(p*n)\n    A = X[:n*p].reshape(n, p)\n    n_vec.append(n)\n    c_vec.append(np.linalg.cond(A))\n\nfig, ax = plt.subplots()\nax.plot(n_vec, c_vec)\nax.set_xlabel(&quot;n&quot;)\nax.set_ylabel(&quot;condition number of A&quot;)\nax.set_title(&quot;Condition Number of A for p=500&quot;)<\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"768\" src=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/05\/condition_number-1024x768.png\" alt=\"\" class=\"wp-image-886\" srcset=\"https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/05\/condition_number-1024x768.png 1024w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/05\/condition_number-300x225.png 300w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/05\/condition_number-768x576.png 768w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/05\/condition_number-1536x1152.png 1536w, https:\/\/lorentzen.ch\/wp-content\/uploads\/2022\/05\/condition_number.png 1920w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Condition number of the random matrix A<\/figcaption><\/figure>\n\n\n\n<p>As a result of the ill-conditioned <span class=\"katex-eq\" data-katex-display=\"false\">A<\/span>, the LSQR solver has problems to achieve its tolerance criterion, needs more iterations, and takes longer time. This is exactly what we observed in the benchmark plots: the peak occurred around <span class=\"katex-eq\" data-katex-display=\"false\">n=p<\/span>. The SVD-based lstsq solver, on the other hand, does not use an iterative scheme and does not need more time for ill-conditioned matrices.<\/p>\n\n\n\n<p>You find the accompanying notebook here: <a href=\"https:\/\/github.com\/lorentzenchr\/notebooks\/blob\/master\/blogposts\/2022-05-22%20from%20least%20squares%20benchmarks%20to%20the%20marchenko-pastur%20distribution.ipynb\">https:\/\/github.com\/lorentzenchr\/notebooks\/blob\/master\/blogposts\/2022-05-22%20from%20least%20squares%20benchmarks%20to%20the%20marchenko-pastur%20distribution.ipynb<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this blog post, I tell the story how I learned about a theorem for random matrices of the two Ukrainian\ud83c\uddfa\ud83c\udde6 mathematicians Vladimir Marchenko and Leonid Pastur. It all started with benchmarking least squares solvers in scipy. Setting the Stage for Least Squares Solvers Least squares starts with a matrix and a vector and one [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[15,9],"tags":[6],"class_list":["post-792","post","type-post","status-publish","format-standard","hentry","category-linear-algebra","category-statistics","tag-python"],"featured_image_src":null,"author_info":{"display_name":"Christian Lorentzen","author_link":"https:\/\/lorentzen.ch\/index.php\/author\/christian\/"},"_links":{"self":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/792","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/comments?post=792"}],"version-history":[{"count":81,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/792\/revisions"}],"predecessor-version":[{"id":889,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/posts\/792\/revisions\/889"}],"wp:attachment":[{"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/media?parent=792"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/categories?post=792"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lorentzen.ch\/index.php\/wp-json\/wp\/v2\/tags?post=792"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}