To receive announcements by email, subscribe to the student mailing list (for students at the University of Waterloo) or the general mailing list. To schedule a talk for the seminar, contact the organizers directly. For more information about our events, please contact Anurag Naredla

Title: An optimal separation of randomized and quantum query complexity

Abstract: We prove that for every decision tree, the absolute values of the Fourier coefficients of given order $\ell\geq1$ sum to at most $c^{\ell}\sqrt{\binom{d}{\ell}(1+\log n)^{\ell-1}},$ where $n$ is the number of variables, $d$ is the tree depth, and $c>0$ is an absolute constant. This bound is essentially tight and settles a conjecture due to Tal (arxiv 2019; FOCS 2020). The bounds prior to our work degraded rapidly with $\ell,$ becoming trivial already at $\ell=\sqrt{d}.$ As an application, we obtain, for every integer $k\geq1,$ a partial Boolean function on $n$ bits that has bounded-error quantum query complexity at most $\lceil k/2 rceil$ and randomized query complexity $\tilde{\Omega}(n^{1-1/k}).$ This separation of bounded-error quantum versus randomized query complexity is best possible, by the results of Aaronson and Ambainis (STOC 2015). Prior to our work, the best known separation was polynomially weaker: $O(1)$ versus $\Omega(n^{2/3-\epsilon})$ for any $\epsilon>0$ (Tal, FOCS 2020). As another application, we obtain an essentially optimal separation of $O(\log n)$ versus $\Omega(n^{1-\epsilon})$ for bounded-error quantum versus randomized communication complexity, for any $\epsilon>0.$ The best previous separation was polynomially weaker: $O(\log n)$ versus $\Omega(n^{2/3-\epsilon})$ (implicit in Tal, FOCS 2020).

Title: Private Mean Estimation of Heavy-Tailed Distributions

Abstract: We give new upper and lower bounds on the minimax sample complexity of differentially private mean estimation of distributions with bounded k-th moments. Roughly speaking, in the univariate case, we show that n = \Theta(1/\alpha^2 + 1/\epsilon\alpha^{k/(k-1)}) samples are necessary and sufficient to estimate the mean to \alpha-accuracy under \epsilon-differential privacy, or any of its common relaxations. This result demonstrates a qualitatively different behavior compared to estimation absent privacy constraints, for which the sample complexity is identical for all k >= 2. We also give algorithms for the multivariate setting whose sample complexity is a factor of O(d) larger than the univariate case.

Title: Optimal Sub-Gaussian Mean Estimation in R

Abstract: We revisit and settle one of the most fundamental problems in statistics: given access to independent samples from a 1D random variable (with finite but unknown mean and variance), what is the best way to estimate the mean in the high probability regime,in terms of error convergence with respect to sample size? The conventional wisdom is to use the empirical mean as our estimate. However, it is known that the empirical mean can in fact have exponentially sub-optimal convergence for certain heavy-tailed distributions.On the other hand, the median-of-means estimator (invented and reinvented in various literature) does have sub-Gaussian convergence for all finite-variance distributions, albeit only in the big-O sense with a sub-optimal multiplicative constant. The natural remaining question then, is whether it is possible to bridge the gap, to have an estimator that has optimal sub-Gaussian concentration with an optimal constant, for all finite-variance distributions.

In this talk, we answer the question affirmatively by giving an estimator that converges with the optimal constant inside the big-O, up to a 1+o(1) multiplicative factor. Our estimator is furthermore computable in time linear in the sample size. The convergence analysis involves deriving tail bounds using linear and convex-concave programming dualities, which may be of independent interest.

This is joint work with Paul Valiant.

Title: Lazy Search Trees

Abstract: We introduce the lazy search tree data structure. The lazy search tree is a comparison-based data structure on the pointer machine that supports order-based operations such as rank, select, membership, predecessor, successor, minimum, and maximumwhile providing dynamic operations insert, delete, change-key, split, and merge. We analyze the performance of our data structure based on a partition of current elements into a set of gaps {Delta_i} based on rank. A query falls into a particular gap and splitsthe gap into two new gaps at a rank r associated with the query operation. If we define B = sum_i |Delta_i| log_2(n/|Delta_i|), our performance over a sequence of n insertions and q distinct queries is O(B + min(n log log n, n log q)). We show B is a lowerbound.

Effectively, we reduce the insertion time of binary search trees from Theta(log n) to O(min(log(n/|Delta_i|) + log log |Delta_i|, log q)), where Delta_i is the gap in which the inserted element falls. Over a sequence of n insertions and q queries, a time boundof O(n log q + q log n) holds; better bounds are possible when queries are non-uniformly distributed. As an extreme case of non-uniformity, if all queries are for the minimum element, the lazy search tree performs as a priority queue with O(log log n) timeinsert and decrease-key operations. The same data structure supports queries for any rank, interpolating between binary search trees and efficient priority queues.

Lazy search trees can be implemented to operate mostly on arrays, requiring only O(min(q, n)) pointers, suggesting smaller memory footprint, better constant factors, and better cache performance compared to many existing efficient priority queues or binarysearch trees. Via direct reduction, our data structure also supports the efficient access theorems of the splay tree, providing a powerful data structure for non-uniform element access, both when the number of accesses is small and large.

Joint work with Sebastian Wild. Accepted for publication in FOCS 2020; arXiv: https://arxiv.org/abs/2010.08840

Title: From high-dimensional random landscapes to statistical inference

Abstract: We study the algorithmic thresholds for principal component analysis of Gaussian k-tensors with a planted rank-one spike, via Langevin dynamics and gradient descent. In order to efficiently recover the spike from natural initializations, the signal-to-noise ratio must diverge in the dimension. Our proof shows that the mechanism for the success/failure of recovery is the strength of the “curvature” of the spike on the maximum entropy region of the initial data. To demonstrate this, we study the dynamics on a generalized family of high-dimensional landscapes with planted signals, containing the spiked tensor models as specific instances. We identify thresholds of signal-to-noise ratios above which order 1 time recovery succeeds; in the case of the spiked tensor model, these match the thresholds conjectured for algorithms such as approximate message passing. Below these thresholds, where the curvature of the signal on the maximal entropy region is weak, we show that recovery from certain natural initializations takes at least stretched exponential time. Our approach combines global regularity estimates for spin glasses with pointwise estimates to study the recovery problem by a perturbative approach.

Title: New Nearly-Optimal Coreset for Kernel Density Estimation

Abstract: Given a point set $P \subset \mathbb{R}^d$, kernel density estimation for Gaussian kernel is defined as $\overline{\mathcal{G}}*P(x) = \frac{1}{\left|P\right|}\sum*{p\inP}e^{-\left\lVert x-p \right\rVert^2}$ for any $x\in\mathbb{R}^d$. We study how to construct a small subset $Q$ of $P$ such that the kernel density estimation of $P$ can be approximated by the kernel density estimation of $Q$. This subset $Q$ is called \emph{coreset}. The primary technique in this work is to construct $\pm 1$ coloring on the point set $P$ by the discrepancy theory and apply this coloring algorithm recursively. Our result leverages Banaszczyk’s Theorem. When $d>1$ is constant, our construction gives a coreset of size $O\left(\frac{1}{\varepsilon}\sqrt{\log\log\frac{1}{\varepsilon}}\right)$as opposed to the best-known result of $O\left(\frac{1}{\varepsilon}\sqrt{\log\frac{1}{\varepsilon}}\right)$. It is the first to give a breakthrough on the barrier of $\sqrt{\log}$ factor even when $d=2$.

Title: Factoring Polynomials given as Arithmetic Branching Programs (Joint work with Thomas Thierauf, to appear in CCC 20)

Abstract: A basic problem in computational algebra is polynomial factoring: Given a polynomial, compute all its irreducible factors. In a classic result, Kaltofen proved that if a polynomial P(x_1,…,x_n) of degree d is given as an arithmetic circuit of size s, then all its factors can be computed by arithmetic circuits of size poly(s,d). In other words, the algebraic complexity class VP is closed under factors. This result has applications in algebraic complexity, for example in derandomization of polynomial identity testing using explicit hard polynomials (lower bounds).

A natural direction to extend Kaltofen’s result is to prove analogous factor size bounds in more restricted / more general models than arithmetic circuits. Kaltofen’s proofs do not extend to the restricted models, as reusing previous computations is costly for these models. Recent works in this area focused on models like constant depth circuits, arithmetic formulas, arithmetic branching programs. In a beautiful work, Oliveira (CCC 2015) proved that factors of a low-depth circuit/formula can be computed by small low-depth circuits/formulas, if we assume that the individual degree of the given polynomial is bounded by constant. In a follow-up work, Dutta, Saxena and Sinhababu (DSS, STOC 18) showed poly (s^log d) factor size upper bound for arithmetic formulas and branching programs.

In this work, we show poly (s,d) factor size upper bound for the model of arithmetic branching programs (ABP). Whereas the works of Oliveira and DSS used different versions of Newton iteration, we use the classic technique of Hensel lifting. Although Newton iteration and Hensel lifting are related techniques, the latter gives better bound for ABPs. We also use that the determinant of a symbolic matrix can be computed by a small ABP. In this talk, we give a brief survey of the earlier works and present the main ideas of our proof.

Paper: https://image.informatik.htw-aalen.de/~thierauf/Papers/ABP-factors.pdf

Title: Testing Noisy Linear Equations for Sparsity

Abstract: Consider the following basic problem in sparse linear regression – an algorithm gets labeled samples of the form (x, w.x + \eps) where w is an unknown n-dimensional vector, x is drawn from a background distribution D and \eps is some independent noise. Given the promise that w is k-sparse, the breakthrough work of Candes, Rhomberg and Tao (2005) shows that w can be recovered with samples and time which scales as O(k log n). This should be contrasted with general linear regression where O(n) samples are information theoretically necessary.

In this talk, we look at this question from the vantage point of property testing and study the decision variant of the following question – namely, what is the complexity of deciding if the unknown vector w is k-sparse (or at least say 0.01 far from k-sparse in \ell_2 distance). We show that the decision version of the problem can be solved with samples which are independent of n as long as the background distribution D is i.i.d. and the components are not Gaussian. We further show that weakening any of the conditions in this result necessarily makes the complexity scale as log n (thus showing our results are tight).

Joint work with Xue Chen (Northwestern) and Rocco Servedio (Columbia).

Title: A Spectral Approach to Network Design

Abstract: In this talk, I will present a spectral approach to design approximation algorithms for network design problems. We observe that the underlying mathematical questions are the spectral rounding problems, which were studied in spectral sparsification and in discrepancy theory. We extend these results to incorporate additional non-negative linear constraints, and show that they can be used to significantly extend the scope of network design problems that can be solved. Our algorithm for spectral rounding is an iterative randomized rounding algorithm based on the regret minimization framework. In some settings, this provides an alternative spectral algorithm to achieve constant factor approximation for the classical survivable network design problem.

Joint work with Lap Chi Lau. Arxiv link of the manuscript: https://arxiv.org/abs/2003.07810

Title: Forecasting algorithms: a new approach to randomized lower bounds

Abstract: In this talk, I’ll present a new approach to randomized lower bounds, particularly in a setting in which we wish to give a fine-grained analysis of randomized algorithms that achieve small bias. The approach is as follows: instead of considering ordinary randomized algorithms which give an output in {0,1} and may err, we switch models to look at “forecasting” randomized algorithms which output a confidence in [0,1] for whether they think the answer is 1. When scored by a proper scoring rule, the performance of the best forecasting algorithm is closely related to the bias of the best (ordinary) randomized algorithm; however, the forecasting algorithms (when analyzed using the right scoring rule) have elegant mathematical properties that make them more amenable to analysis.

As an application, I’ll present a new minimax theorem for randomized algorithms, which can be viewed as a strengthening of Yao’s minimax theorem. Yao’s minimax theorem guarantees the existence of a hard distribution for a function f such that solving f against this distribution (to a desired error level) is as hard as solving f in the worst case (to that same error level). However, the hard distribution provided by Yao’s theorem depends on the chosen error level. Our minimax theorem removes this dependence,giving a distribution which certifies the hardness of f against all bias levels at once. In recent work, we used this minimax theorem to give a tight composition theorem for randomized query complexity.

Joint work with Eric Blais.

Title: Can algebraic circuit lower bounds have easy proofs?

Abstract: The field of Algebraic Circuit Complexity studies how succinctly one can represent explicit multivariate polynomials. Algebraic circuits (analogous to boolean circuits) are the most natural model for computing polynomials. A central question in this field istherefore to find explicit polynomials that require “large” algebraic circuits. The best known lower bounds for general circuits are due to Baur and Strassen (1983) and Smolenksy (1997), both of which give (tight) Omega(n log d) lower bounds for an explicitn-variate polynomial of degree d. Although there has been considerable progress in proving lower bounds for simpler models, there has not been much progress for more general computational classes. This lack of progress over the last few decades leads to thefollowing question. Are most of the current techniques incapable of proving a superpolynomial lower bound for general circuits?

Almost all the current techniques for showing a lower bound against a class of circuits C, can also be used to obtain what is called a “defining equation” for the class C that is also easy to compute. Thus, the above question is equivalent to asking whetherthe class of polynomials computable by polynomial-sized circuits has efficiently computable defining equations.

Based on the important work of Razborov and Rudich (1997) in the boolean world, Forbes, Shpilka, and Volk (2018), and Grochow, Kumar, Saks and Saraf (2017) introduced the framework of Algebraically Natural Proofs. They showed that under a certain derandomizationassumption, such defining equations do not exist; thereby hinting towards the existence of a barrier for proving lower bounds. We show that there \emph{are} efficiently computable defining equations for the class of polynomials with bounded coefficients thatare computable by polynomial-sized circuits. This provides evidence \emph{against} the existence of a barrier for proving lower bounds.

Joint work with Prerona Chatterjee (TIFR, Mumbai), Mrinal Kumar (IITB, Mumbai), C. Ramya (TIFR, Mumbai) and Ramprasad Saptharishi (TIFR, Mumbai).

Title: Robustness in unsupervised and supervised machine learning

Abstract: Recently, the need for robust machine learning algorithms has become apparent. Whether due to errors in data collection, model misspecification, or adversarial attacks, contaminated datasets arise in many areas. This is an issue, as existing methods appear to be quite brittle to small amounts of errors. Even more worryingly, these models are being deployed in many security-critical settings, such as self-driving cars, where reliability is an absolute must.

In this talk, I will describe a line of work in which we provide provable guarantees for robust machine learning in several fundamental settings. I’ll begin by discussing the problem of robust estimation of mean and covariance of a Gaussian distribution, and how to relax this to distributions with weaker assumptions on the moments. I will then describe how these methods can be used to “robustify” supervised learning algorithms by applying robust mean estimation algorithms to the gradients of the dataset. While theoretically sound, the algorithms are also realizable and efficient, and I will present experimental results on both synthetic and real-world data.

Based on joint works with Ilias Diakonikolas, Daniel M. Kane, Jerry Li, Ankur Moitra, Jacob Steinhardt, and Alistair Stewart.

Title: Local Graph Clustering

Abstract: Graphs, long popular in computer science and discrete mathematics, have received renewed interest because they provide a useful way to model many types of relational data. In biology, e.g., graphs are routinely used to generate hypotheses for experimental validation; in neuroscience, they are used to study the networks and circuits in the brain; and in social networks, they are used to find common behaviors of users. These modern graph applications require the analysis of large graphs, and this can be computationally expensive. Graph algorithms have been developed to identify and interpret small-scale local structure in large-scale data without the requirement to access all the data.

In this talk, we will discuss state-of-the-art local spectral-, flow-, and Lp-norm-based algorithms that are specialized in finding small-scale clusters in large graphs without accessing the whole graph. We will discuss worst- and average-case theoretical results of these algorithms and we will demonstrate their empirical performance.

Based on joint work with Shenghao Yang, Di Wang, Wooseok Ha, Julian Shun, Farbod Roosta Khorasani, Xiang Cheng.