To receive announcements by email, subscribe to the seminar mailing list. To schedule a talk for the seminar, contact the organizers directly. For more information about our events, please contact Argyris Mouzakis
Title: On Matrix Multiplication and Polynomial Identity Testing
Abstract: Determining the complexity of matrix multiplication is a fundamental problem of theoretical computer science. It is popularly conjectured that ω, the matrix multiplication exponent, equals 2. If true, this conjecture would yield fast algorithms for a wide array of problems in linear algebra and beyond. If instead ω > 2, can we leverage the hardness of matrix multiplication to design algorithms for other problems?
In this talk, I will describe how lower bounds on ω can be used to make progress on derandomizing polynomial identity testing.
Title: Isoperimetric Inequalities for Real-Valued Functions with Applications to Monotonicity Testing
Abstract: We study sublinear algorithms for monotonicity of real-valued functions. An algorithm for testing monotonicity queries a function at a few locations and distinguishes between when the function is monotone and when it is far from monotone. An algorithm for approximating the distance to monotonicity returns the distance to monotonicity of the function with some additive and multiplicative error. We improve previous algorithms for these two tasks by showing that several isoperimetric inequalities about Boolean functions on the hypercube can be generalized to real-valued functions
A celebrated line of work [Margulis ‘74, Talagrand ‘93, Chakrabarty and Seshadhri ‘13, Khot Minzer Safra ‘15] studied the size of the ‘‘boundary’’ between the points on which a Boolean function takes value 0 and the points on which it takes value 1. This boundary is defined in terms of the edges of the d-dimensional hypercube, where the edges can be directed or undirected. In particular, the inequality of Khot, Minzer, and Safra ‘15 implies all previous inequalities. It bounds the average square root of the number of decreasing edges incident on a vertex of the hypercube by the distance to monotonicity of the function. We generalize all the inequalities in this line of work to real-valued functions. Our main contribution is a Boolean decomposition technique that represents a real-valued function as a collection of Boolean functions that roughly capture the distance to monotonicity of the original function. In this talk, we prove the Boolean decomposition theorem and touch on the improved algorithms for monotonicity testing and distance approximation.
This is joint work with Hadley Black and Sofya Raskhodnikova.
Title: New Lower Bounds for Set Multilinear Formulas
Abstract: The recent exciting breakthrough by Limaye, Srinivasan, and Tavenas showing superpolynomial lower bounds for constant-depth algebraic circuits has underscored the importance of studying the complexity of set multilinear formulas. In this talk, along with discussing the impact of this work and several others in the literature, I will discuss some new results in this area, namely strong (and “sharp”) lower bounds for set-multilinear circuits/formulas in the constant (or low) depth setting.
Joint work with Deepanshu Kush.
Title: The Brascamp-Lieb polytope: matroid matching and rank of matrix spaces
Abstract: Given a linear subspace of matrices $\mathcal{A} := \langle A_{1}, …, A_{m} \rangle \subseteq \mathbb{F}^{n \times n}$, Edmond’s problem asks to decide whether there is a matrix in $\mathcal{A}$ of full rank. This can be rephrased as finding the rank of the matrix $A := \sum_{i=1}^{m} x_{i} A_{i}$ over the field of rational functions $\mathbb{F}(x_{1}, …, x_{m})$.
This linear algebraic problem turns out to connect to various problems in combinatorial optimization: maximum matching in graphs, linear matroid intersection, and their common generalization linear matroid matching [Lovasz, 1980].
By the Schwarz-Zippel lemma, there is a simple randomized algorithm to answer this question, but derandomizing this result is a major open question that would be an important breakthrough in complexity theory [KI04].
A related variant, known as the non-commutative Edmond’s problem, asks to compute the rank of $A$ over the “free skew field” in which the variables $x$ no longer commute. Surprisingly, recent work has shown that this seemingly more complicated problem can in fact be solved by a deterministic algorithm in polynomial time.
In this talk, we will discuss one of these algorithms and a recent result [Oki, Soma] relating this problem to the Brascamp-Lieb Theorem from functional analysis. In particular, they show that the algorithm for non-commutative rank can be used to optimize over the Brascamp-Lieb polytope for rank 2 instances, which gives the first truly polynomial time algorithm to maximize the fractional relaxation of linear matroid matching [GP08].
This problem can be considered as a natural “higher rank” generalization of optimization over linear matroid polytopes.
Title: Computability and Complexity in Analytic Combinatorics
Abstract: Enumerative combinatorics studies discrete objects by capturing aspects of their behaviour (such as the number of objects of a given size) using sequences. In this talk we explore how to combine pure mathematical tools with computational methods to answer questions about the computability and complexity of asymptotic behaviour for sequences under different encodings that arise frequently in combinatorics.
Applications discussed will include (time permitting) the analysis of classical algorithms, models predicting the shape of biomembranes, queuing theory, random walks, ratchet models for gene expression, transcendence of zeta values, restricted permutations, maximum likelihood degree in algebraic statistics, sampling algorithms for perfect matchings in bipartite graphs, and parallel synthesis for DNA storage.
Title: Almost linear time algorithms for max-flow and more
Abstract: We give the first almost-linear time algorithm for computing exact maximum flows and minimum-cost flows on directed graphs. By well known reductions, this implies almost-linear time algorithms for several problems including bipartite matching, optimal transport, and undirected vertex connectivity.
Our algorithm is designed using a new Interior Point Method (IPM) that builds the flow as a sequence of almost-linear number of approximate undirected minimum-ratio cycles, each of which is computed and processed very efficiently using a new dynamic data structure.
Our framework extends to give an almost-linear time algorithm for computing flows that minimize general edge-separable convex functions to high accuracy. This gives the first almost-linear time algorithm for several problems including entropy-regularized optimal transport, matrix scaling, p-norm flows, and Isotonic regression.
Joint work with Li Chen, Rasmus Kyng, Yang Liu, Richard Peng, and Maximilian Probst Gutenberg.
Title: A Tale of Two Reconstructions
Abstract: Arithmetic circuits are a natural model for computing polynomials using basic arithmetic operations like addition and multiplication. The problem of learning arithmetic circuits, a.k.a. reconstruction, is an important and well-studied problem. Here we are given a polynomial (either explicitly as a list of coefficients or using black-box access) and the goal is to find the smallest (or approximately smallest) arithmetic circuit computing it. In this talk, we will discuss two vastly different paradigms in learning arithmetic circuits, which are worst-case learning and average-case (non-degenerate) learning. We will elaborate on various techniques involved in these paradigms, their similarity, and their differences. Along the way, we will see a worst-case as well as a non-degenerate algorithm for decomposing tensors. Time permitting, we will also discuss some recent progress in non-degenerate learning of “generalized” depth 3 arithmetic circuits.
Title: Optimization Beyond Minimization
Abstract: Motivated by recent advances in both theoretical and applied aspects of multiplayer games, spanning from e-sports to multi-agent generative adversarial networks, a surge of different studies has been focused on the core problem of understanding the behavior of game dynamics in general N-player games. From the seminal settings of two competitive players and Min-Max Optimization to the complete understanding of how the day-to-day behavior of the dynamics correlates to the game’s different notion of equilibria is much more limited, and only partial results are known for certain classes of games (such as zero-sum or congestion games). In this talk, we study from two different perspectives arguably the most well-studied class of no-regret dynamics, “Follow-the-regularized-leader” (FTRL) and Discretizations of Gradient Flow (GDA/OGDA/EG), and we establish a sweeping negative result showing that the notion of mixed Nash equilibrium is antithetical to no-regret learning. Specifically, we show that any Nash equilibrium which is not strict (in that every player has a unique best response) cannot be stable and attracting under the dynamics of FTGL. This result has significant implications for predicting the outcome of a learning process as it shows unequivocally that only strict (and hence, pure) Nash equilibria can emerge as stable limit points thereof. For a final happy end story, we present either structural examples of families where convergence is possible providing the last-iterate convergence rates or even new methods inspired from other areas like control theory & planning.
Title: An optimal separation of randomized and quantum query complexity
Abstract: We prove that for every decision tree, the absolute values of the Fourier coefficients of given order $\ell\geq1$ sum to at most $c^{\ell}\sqrt{\binom{d}{\ell}(1+\log n)^{\ell-1}},$ where $n$ is the number of variables, $d$ is the tree depth, and $c>0$ is an absolute constant. This bound is essentially tight and settles a conjecture due to Tal (arxiv 2019; FOCS 2020). The bounds prior to our work degraded rapidly with $\ell,$ becoming trivial already at $\ell=\sqrt{d}.$ As an application, we obtain, for every integer $k\geq1,$ a partial Boolean function on $n$ bits that has bounded-error quantum query complexity at most $\lceil k/2 rceil$ and randomized query complexity $\tilde{\Omega}(n^{1-1/k}).$ This separation of bounded-error quantum versus randomized query complexity is best possible, by the results of Aaronson and Ambainis (STOC 2015). Prior to our work, the best known separation was polynomially weaker: $O(1)$ versus $\Omega(n^{2/3-\epsilon})$ for any $\epsilon>0$ (Tal, FOCS 2020). As another application, we obtain an essentially optimal separation of $O(\log n)$ versus $\Omega(n^{1-\epsilon})$ for bounded-error quantum versus randomized communication complexity, for any $\epsilon>0.$ The best previous separation was polynomially weaker: $O(\log n)$ versus $\Omega(n^{2/3-\epsilon})$ (implicit in Tal, FOCS 2020).
Title: Private Mean Estimation of Heavy-Tailed Distributions
Abstract: We give new upper and lower bounds on the minimax sample complexity of differentially private mean estimation of distributions with bounded k-th moments. Roughly speaking, in the univariate case, we show that n = \Theta(1/\alpha^2 + 1/\epsilon\alpha^{k/(k-1)}) samples are necessary and sufficient to estimate the mean to \alpha-accuracy under \epsilon-differential privacy, or any of its common relaxations. This result demonstrates a qualitatively different behavior compared to estimation absent privacy constraints, for which the sample complexity is identical for all k >= 2. We also give algorithms for the multivariate setting whose sample complexity is a factor of O(d) larger than the univariate case.
Title: Optimal Sub-Gaussian Mean Estimation in R
Abstract: We revisit and settle one of the most fundamental problems in statistics: given access to independent samples from a 1D random variable (with finite but unknown mean and variance), what is the best way to estimate the mean in the high probability regime,in terms of error convergence with respect to sample size? The conventional wisdom is to use the empirical mean as our estimate. However, it is known that the empirical mean can in fact have exponentially sub-optimal convergence for certain heavy-tailed distributions.On the other hand, the median-of-means estimator (invented and reinvented in various literature) does have sub-Gaussian convergence for all finite-variance distributions, albeit only in the big-O sense with a sub-optimal multiplicative constant. The natural remaining question then, is whether it is possible to bridge the gap, to have an estimator that has optimal sub-Gaussian concentration with an optimal constant, for all finite-variance distributions.
In this talk, we answer the question affirmatively by giving an estimator that converges with the optimal constant inside the big-O, up to a 1+o(1) multiplicative factor. Our estimator is furthermore computable in time linear in the sample size. The convergence analysis involves deriving tail bounds using linear and convex-concave programming dualities, which may be of independent interest.
This is joint work with Paul Valiant.
Title: Lazy Search Trees
Abstract: We introduce the lazy search tree data structure. The lazy search tree is a comparison-based data structure on the pointer machine that supports order-based operations such as rank, select, membership, predecessor, successor, minimum, and maximumwhile providing dynamic operations insert, delete, change-key, split, and merge. We analyze the performance of our data structure based on a partition of current elements into a set of gaps {Delta_i} based on rank. A query falls into a particular gap and splitsthe gap into two new gaps at a rank r associated with the query operation. If we define B = sum_i |Delta_i| log_2(n/|Delta_i|), our performance over a sequence of n insertions and q distinct queries is O(B + min(n log log n, n log q)). We show B is a lowerbound.
Effectively, we reduce the insertion time of binary search trees from Theta(log n) to O(min(log(n/|Delta_i|) + log log |Delta_i|, log q)), where Delta_i is the gap in which the inserted element falls. Over a sequence of n insertions and q queries, a time boundof O(n log q + q log n) holds; better bounds are possible when queries are non-uniformly distributed. As an extreme case of non-uniformity, if all queries are for the minimum element, the lazy search tree performs as a priority queue with O(log log n) timeinsert and decrease-key operations. The same data structure supports queries for any rank, interpolating between binary search trees and efficient priority queues.
Lazy search trees can be implemented to operate mostly on arrays, requiring only O(min(q, n)) pointers, suggesting smaller memory footprint, better constant factors, and better cache performance compared to many existing efficient priority queues or binarysearch trees. Via direct reduction, our data structure also supports the efficient access theorems of the splay tree, providing a powerful data structure for non-uniform element access, both when the number of accesses is small and large.
Joint work with Sebastian Wild. Accepted for publication in FOCS 2020; arXiv: https://arxiv.org/abs/2010.08840
Title: From high-dimensional random landscapes to statistical inference
Abstract: We study the algorithmic thresholds for principal component analysis of Gaussian k-tensors with a planted rank-one spike, via Langevin dynamics and gradient descent. In order to efficiently recover the spike from natural initializations, the signal-to-noise ratio must diverge in the dimension. Our proof shows that the mechanism for the success/failure of recovery is the strength of the “curvature” of the spike on the maximum entropy region of the initial data. To demonstrate this, we study the dynamics on a generalized family of high-dimensional landscapes with planted signals, containing the spiked tensor models as specific instances. We identify thresholds of signal-to-noise ratios above which order 1 time recovery succeeds; in the case of the spiked tensor model, these match the thresholds conjectured for algorithms such as approximate message passing. Below these thresholds, where the curvature of the signal on the maximal entropy region is weak, we show that recovery from certain natural initializations takes at least stretched exponential time. Our approach combines global regularity estimates for spin glasses with pointwise estimates to study the recovery problem by a perturbative approach.
Title: New Nearly-Optimal Coreset for Kernel Density Estimation
Abstract: Given a point set $P \subset \mathbb{R}^d$, kernel density estimation for Gaussian kernel is defined as $\overline{\mathcal{G}}P(x) = \frac{1}{\left|P\right|}\sum{p\inP}e^{-\left\lVert x-p \right\rVert^2}$ for any $x\in\mathbb{R}^d$. We study how to construct a small subset $Q$ of $P$ such that the kernel density estimation of $P$ can be approximated by the kernel density estimation of $Q$. This subset $Q$ is called \emph{coreset}. The primary technique in this work is to construct $\pm 1$ coloring on the point set $P$ by the discrepancy theory and apply this coloring algorithm recursively. Our result leverages Banaszczyk’s Theorem. When $d>1$ is constant, our construction gives a coreset of size $O\left(\frac{1}{\varepsilon}\sqrt{\log\log\frac{1}{\varepsilon}}\right)$as opposed to the best-known result of $O\left(\frac{1}{\varepsilon}\sqrt{\log\frac{1}{\varepsilon}}\right)$. It is the first to give a breakthrough on the barrier of $\sqrt{\log}$ factor even when $d=2$.
Title: Factoring Polynomials given as Arithmetic Branching Programs (Joint work with Thomas Thierauf, to appear in CCC 20)
Abstract: A basic problem in computational algebra is polynomial factoring: Given a polynomial, compute all its irreducible factors. In a classic result, Kaltofen proved that if a polynomial P(x_1,…,x_n) of degree d is given as an arithmetic circuit of size s, then all its factors can be computed by arithmetic circuits of size poly(s,d). In other words, the algebraic complexity class VP is closed under factors. This result has applications in algebraic complexity, for example in derandomization of polynomial identity testing using explicit hard polynomials (lower bounds).
A natural direction to extend Kaltofen’s result is to prove analogous factor size bounds in more restricted / more general models than arithmetic circuits. Kaltofen’s proofs do not extend to the restricted models, as reusing previous computations is costly for these models. Recent works in this area focused on models like constant depth circuits, arithmetic formulas, arithmetic branching programs. In a beautiful work, Oliveira (CCC 2015) proved that factors of a low-depth circuit/formula can be computed by small low-depth circuits/formulas, if we assume that the individual degree of the given polynomial is bounded by constant. In a follow-up work, Dutta, Saxena and Sinhababu (DSS, STOC 18) showed poly (s^log d) factor size upper bound for arithmetic formulas and branching programs.
In this work, we show poly (s,d) factor size upper bound for the model of arithmetic branching programs (ABP). Whereas the works of Oliveira and DSS used different versions of Newton iteration, we use the classic technique of Hensel lifting. Although Newton iteration and Hensel lifting are related techniques, the latter gives better bound for ABPs. We also use that the determinant of a symbolic matrix can be computed by a small ABP. In this talk, we give a brief survey of the earlier works and present the main ideas of our proof.
Paper: https://image.informatik.htw-aalen.de/~thierauf/Papers/ABP-factors.pdf
Title: Testing Noisy Linear Equations for Sparsity
Abstract: Consider the following basic problem in sparse linear regression – an algorithm gets labeled samples of the form (x, w.x + \eps) where w is an unknown n-dimensional vector, x is drawn from a background distribution D and \eps is some independent noise. Given the promise that w is k-sparse, the breakthrough work of Candes, Rhomberg and Tao (2005) shows that w can be recovered with samples and time which scales as O(k log n). This should be contrasted with general linear regression where O(n) samples are information theoretically necessary.
In this talk, we look at this question from the vantage point of property testing and study the decision variant of the following question – namely, what is the complexity of deciding if the unknown vector w is k-sparse (or at least say 0.01 far from k-sparse in \ell_2 distance). We show that the decision version of the problem can be solved with samples which are independent of n as long as the background distribution D is i.i.d. and the components are not Gaussian. We further show that weakening any of the conditions in this result necessarily makes the complexity scale as log n (thus showing our results are tight).
Joint work with Xue Chen (Northwestern) and Rocco Servedio (Columbia).
Title: A Spectral Approach to Network Design
Abstract: In this talk, I will present a spectral approach to design approximation algorithms for network design problems. We observe that the underlying mathematical questions are the spectral rounding problems, which were studied in spectral sparsification and in discrepancy theory. We extend these results to incorporate additional non-negative linear constraints, and show that they can be used to significantly extend the scope of network design problems that can be solved. Our algorithm for spectral rounding is an iterative randomized rounding algorithm based on the regret minimization framework. In some settings, this provides an alternative spectral algorithm to achieve constant factor approximation for the classical survivable network design problem.
Joint work with Lap Chi Lau. Arxiv link of the manuscript: https://arxiv.org/abs/2003.07810
Title: Forecasting algorithms: a new approach to randomized lower bounds
Abstract: In this talk, I’ll present a new approach to randomized lower bounds, particularly in a setting in which we wish to give a fine-grained analysis of randomized algorithms that achieve small bias. The approach is as follows: instead of considering ordinary randomized algorithms which give an output in {0,1} and may err, we switch models to look at “forecasting” randomized algorithms which output a confidence in [0,1] for whether they think the answer is 1. When scored by a proper scoring rule, the performance of the best forecasting algorithm is closely related to the bias of the best (ordinary) randomized algorithm; however, the forecasting algorithms (when analyzed using the right scoring rule) have elegant mathematical properties that make them more amenable to analysis.
As an application, I’ll present a new minimax theorem for randomized algorithms, which can be viewed as a strengthening of Yao’s minimax theorem. Yao’s minimax theorem guarantees the existence of a hard distribution for a function f such that solving f against this distribution (to a desired error level) is as hard as solving f in the worst case (to that same error level). However, the hard distribution provided by Yao’s theorem depends on the chosen error level. Our minimax theorem removes this dependence,giving a distribution which certifies the hardness of f against all bias levels at once. In recent work, we used this minimax theorem to give a tight composition theorem for randomized query complexity.
Joint work with Eric Blais.
Title: Can algebraic circuit lower bounds have easy proofs?
Abstract: The field of Algebraic Circuit Complexity studies how succinctly one can represent explicit multivariate polynomials. Algebraic circuits (analogous to boolean circuits) are the most natural model for computing polynomials. A central question in this field istherefore to find explicit polynomials that require “large” algebraic circuits. The best known lower bounds for general circuits are due to Baur and Strassen (1983) and Smolenksy (1997), both of which give (tight) Omega(n log d) lower bounds for an explicitn-variate polynomial of degree d. Although there has been considerable progress in proving lower bounds for simpler models, there has not been much progress for more general computational classes. This lack of progress over the last few decades leads to thefollowing question. Are most of the current techniques incapable of proving a superpolynomial lower bound for general circuits?
Almost all the current techniques for showing a lower bound against a class of circuits C, can also be used to obtain what is called a “defining equation” for the class C that is also easy to compute. Thus, the above question is equivalent to asking whetherthe class of polynomials computable by polynomial-sized circuits has efficiently computable defining equations.
Based on the important work of Razborov and Rudich (1997) in the boolean world, Forbes, Shpilka, and Volk (2018), and Grochow, Kumar, Saks and Saraf (2017) introduced the framework of Algebraically Natural Proofs. They showed that under a certain derandomizationassumption, such defining equations do not exist; thereby hinting towards the existence of a barrier for proving lower bounds. We show that there \emph{are} efficiently computable defining equations for the class of polynomials with bounded coefficients thatare computable by polynomial-sized circuits. This provides evidence \emph{against} the existence of a barrier for proving lower bounds.
Joint work with Prerona Chatterjee (TIFR, Mumbai), Mrinal Kumar (IITB, Mumbai), C. Ramya (TIFR, Mumbai) and Ramprasad Saptharishi (TIFR, Mumbai).
Title: Robustness in unsupervised and supervised machine learning
Abstract: Recently, the need for robust machine learning algorithms has become apparent. Whether due to errors in data collection, model misspecification, or adversarial attacks, contaminated datasets arise in many areas. This is an issue, as existing methods appear to be quite brittle to small amounts of errors. Even more worryingly, these models are being deployed in many security-critical settings, such as self-driving cars, where reliability is an absolute must.
In this talk, I will describe a line of work in which we provide provable guarantees for robust machine learning in several fundamental settings. I’ll begin by discussing the problem of robust estimation of mean and covariance of a Gaussian distribution, and how to relax this to distributions with weaker assumptions on the moments. I will then describe how these methods can be used to “robustify” supervised learning algorithms by applying robust mean estimation algorithms to the gradients of the dataset. While theoretically sound, the algorithms are also realizable and efficient, and I will present experimental results on both synthetic and real-world data.
Based on joint works with Ilias Diakonikolas, Daniel M. Kane, Jerry Li, Ankur Moitra, Jacob Steinhardt, and Alistair Stewart.
Title: Local Graph Clustering
Abstract: Graphs, long popular in computer science and discrete mathematics, have received renewed interest because they provide a useful way to model many types of relational data. In biology, e.g., graphs are routinely used to generate hypotheses for experimental validation; in neuroscience, they are used to study the networks and circuits in the brain; and in social networks, they are used to find common behaviors of users. These modern graph applications require the analysis of large graphs, and this can be computationally expensive. Graph algorithms have been developed to identify and interpret small-scale local structure in large-scale data without the requirement to access all the data.
In this talk, we will discuss state-of-the-art local spectral-, flow-, and Lp-norm-based algorithms that are specialized in finding small-scale clusters in large graphs without accessing the whole graph. We will discuss worst- and average-case theoretical results of these algorithms and we will demonstrate their empirical performance.
Based on joint work with Shenghao Yang, Di Wang, Wooseok Ha, Julian Shun, Farbod Roosta Khorasani, Xiang Cheng.