Orthogonal Rational Functions on the Unit Circle with Prescribed Poles not on the Unit Circle

Orthogonal rational functions (ORF) on the unit circle generalize orthogonal polynomials (poles at infinity) and Laurent polynomials (poles at zero and infinity). In this paper we investigate the properties of and the relation between these ORF when the poles are all outside or all inside the unit d...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Datum:2017
Hauptverfasser: Bultheel, A., Cruz-Barroso, R., Lasarow, A.
Format: Artikel
Sprache:English
Veröffentlicht: Інститут математики НАН України 2017
Schriftenreihe:Symmetry, Integrability and Geometry: Methods and Applications
Online Zugang:http://dspace.nbuv.gov.ua/handle/123456789/149239
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Назва журналу:Digital Library of Periodicals of National Academy of Sciences of Ukraine
Zitieren:Orthogonal Rational Functions on the Unit Circle with Prescribed Poles not on the Unit Circle / A. Bultheel, R. Cruz-Barroso, A. Lasarow // Symmetry, Integrability and Geometry: Methods and Applications. — 2017. — Т. 13. — Бібліогр.: 47 назв. — англ.

Institution

Digital Library of Periodicals of National Academy of Sciences of Ukraine
id irk-123456789-149239
record_format dspace
spelling irk-123456789-1492392019-02-20T01:23:17Z Orthogonal Rational Functions on the Unit Circle with Prescribed Poles not on the Unit Circle Bultheel, A. Cruz-Barroso, R. Lasarow, A. Orthogonal rational functions (ORF) on the unit circle generalize orthogonal polynomials (poles at infinity) and Laurent polynomials (poles at zero and infinity). In this paper we investigate the properties of and the relation between these ORF when the poles are all outside or all inside the unit disk, or when they can be anywhere in the extended complex plane outside the unit circle. Some properties of matrices that are the product of elementary unitary transformations will be proved and some connections with related algorithms for direct and inverse eigenvalue problems will be explained. 2017 Article Orthogonal Rational Functions on the Unit Circle with Prescribed Poles not on the Unit Circle / A. Bultheel, R. Cruz-Barroso, A. Lasarow // Symmetry, Integrability and Geometry: Methods and Applications. — 2017. — Т. 13. — Бібліогр.: 47 назв. — англ. 1815-0659 2010 Mathematics Subject Classification: 30D15; 30E05; 42C05; 44A60 DOI:10.3842/SIGMA.2017.090 http://dspace.nbuv.gov.ua/handle/123456789/149239 en Symmetry, Integrability and Geometry: Methods and Applications Інститут математики НАН України
institution Digital Library of Periodicals of National Academy of Sciences of Ukraine
collection DSpace DC
language English
description Orthogonal rational functions (ORF) on the unit circle generalize orthogonal polynomials (poles at infinity) and Laurent polynomials (poles at zero and infinity). In this paper we investigate the properties of and the relation between these ORF when the poles are all outside or all inside the unit disk, or when they can be anywhere in the extended complex plane outside the unit circle. Some properties of matrices that are the product of elementary unitary transformations will be proved and some connections with related algorithms for direct and inverse eigenvalue problems will be explained.
format Article
author Bultheel, A.
Cruz-Barroso, R.
Lasarow, A.
spellingShingle Bultheel, A.
Cruz-Barroso, R.
Lasarow, A.
Orthogonal Rational Functions on the Unit Circle with Prescribed Poles not on the Unit Circle
Symmetry, Integrability and Geometry: Methods and Applications
author_facet Bultheel, A.
Cruz-Barroso, R.
Lasarow, A.
author_sort Bultheel, A.
title Orthogonal Rational Functions on the Unit Circle with Prescribed Poles not on the Unit Circle
title_short Orthogonal Rational Functions on the Unit Circle with Prescribed Poles not on the Unit Circle
title_full Orthogonal Rational Functions on the Unit Circle with Prescribed Poles not on the Unit Circle
title_fullStr Orthogonal Rational Functions on the Unit Circle with Prescribed Poles not on the Unit Circle
title_full_unstemmed Orthogonal Rational Functions on the Unit Circle with Prescribed Poles not on the Unit Circle
title_sort orthogonal rational functions on the unit circle with prescribed poles not on the unit circle
publisher Інститут математики НАН України
publishDate 2017
url http://dspace.nbuv.gov.ua/handle/123456789/149239
citation_txt Orthogonal Rational Functions on the Unit Circle with Prescribed Poles not on the Unit Circle / A. Bultheel, R. Cruz-Barroso, A. Lasarow // Symmetry, Integrability and Geometry: Methods and Applications. — 2017. — Т. 13. — Бібліогр.: 47 назв. — англ.
series Symmetry, Integrability and Geometry: Methods and Applications
work_keys_str_mv AT bultheela orthogonalrationalfunctionsontheunitcirclewithprescribedpolesnotontheunitcircle
AT cruzbarrosor orthogonalrationalfunctionsontheunitcirclewithprescribedpolesnotontheunitcircle
AT lasarowa orthogonalrationalfunctionsontheunitcirclewithprescribedpolesnotontheunitcircle
first_indexed 2025-07-12T21:10:42Z
last_indexed 2025-07-12T21:10:42Z
_version_ 1837477030254346240
fulltext Symmetry, Integrability and Geometry: Methods and Applications SIGMA 13 (2017), 090, 49 pages Orthogonal Rational Functions on the Unit Circle with Prescribed Poles not on the Unit Circle Adhemar BULTHEEL †, Ruyman CRUZ-BARROSO ‡ and Andreas LASAROW § † Department of Computer Science, KU Leuven, Belgium E-mail: adhemar.bultheel@cs.kuleuven.be URL: https://people.cs.kuleuven.be/~adhemar.bultheel/ ‡ Department of Mathematical Analysis, La Laguna University, Tenerife, Spain E-mail: rcruzb@ull.es § Fak. Informatik, Mathematik & Naturwissenschaften, HTWK Leipzig, Germany E-mail: lasarow@imn.htwk-leipzig.de Received August 01, 2017, in final form November 20, 2017; Published online December 03, 2017 https://doi.org/10.3842/SIGMA.2017.090 Abstract. Orthogonal rational functions (ORF) on the unit circle generalize orthogonal polynomials (poles at infinity) and Laurent polynomials (poles at zero and infinity). In this paper we investigate the properties of and the relation between these ORF when the poles are all outside or all inside the unit disk, or when they can be anywhere in the extended complex plane outside the unit circle. Some properties of matrices that are the product of elementary unitary transformations will be proved and some connections with related algorithms for direct and inverse eigenvalue problems will be explained. Key words: orthogonal rational functions; rational Szegő quadrature; spectral method; ra- tional Krylov method; AMPD matrix 2010 Mathematics Subject Classification: 30D15; 30E05; 42C05; 44A60 1 Introduction Orthogonal rational functions (ORF) on the unit circle are well known as generalizations of orthogonal polynomials on the unit circle (OPUC). The pole at infinity of the polynomials is replaced by poles “in the neighborhood” of infinity, i.e., poles outside the closed unit disk. The recurrence relations for the ORF generalize the Szegő recurrence relations for the polynomials. If µ is the orthogonality measure supported on the unit circle, and L2 µ the corresponding Hilbert space, then the shift operator Tµ : L2 µ → L2 µ : f(z) 7→ zf(z) restricted to the polynomials has a representation with respect to the orthogonal polynomials that is a Hessenberg matrix. However, if instead of a polynomial basis, one uses a basis of orthogonal Laurent polynomials by alternating between poles at infinity and poles at the origin, a full unitary representation of Tµ with respect to this basis is a five-diagonal CMV matrix [12]. The previous ideas have been generalized to the rational case by Velázquez in [47]. He showed that the representation of the shift operator with respect to the classical ORF is not a Hessenberg matrix but a matrix Möbius transform of a Hessenberg matrix. However, a full unitary representation can be obtained if the shift is represented with respect to a rational analog of the Laurent polynomials by alternating between a pole inside and a pole outside the unit disk. The resulting matrix is a matrix Möbius transform of a five-diagonal matrix. This paper is a contribution to the Special Issue on Orthogonal Polynomials, Special Functions and Applica- tions (OPSFA14). The full collection is available at https://www.emis.de/journals/SIGMA/OPSFA2017.html mailto:adhemar.bultheel@cs.kuleuven.be https ://people.cs.kuleuven.be/~adhemar.bultheel/ mailto:rcruzb@ull.es lasarow@imn.htwk-leipzig.de https://doi.org/10.3842/SIGMA.2017.090 https://www.emis.de/journals/SIGMA/OPSFA2017.html 2 A. Bultheel, R. Cruz-Barroso and A. Lasarow Orthogonal Laurent polynomials on the real line, a half-line, or an interval were introduced by Jones et al. [31, 32] in the context of moment problems, Padé approximation and quadrature and this was elaborated by many authors. González-Vera and his co-workers were in particu- lar involved in extending the theory where the poles zero and infinity alternate (the so-called balanced situation) to a more general case where in each step either infinity or zero can be chosen as a pole in any arbitrary order [8, 20]. They also identify the resulting orthogonal Laurent polynomials as shifted versions of the orthogonal polynomials. Hence the orthogonal Laurent polynomials satisfy the same recurrence as the classical orthogonal polynomials after an appropriate shifting and normalization is embedded. The corresponding case of orthogonal Laurent polynomials on the unit circle was introduced by Thron in [40] and has been studied more recently in for example [15, 18]. Papers traditionally deal with the balanced situation like in [18] but in [15] also an arbitrary ordering was considered. Only in [16] Cruz-Barroso and Delvaux investigated the structure of the matrix representation with respect to the basis of the resulting orthogonal Laurent polynomials on the circle. They called it a “snake-shaped” matrix which generalizes the five diagonal matrix. The purpose of this paper is to generalize these ideas valid for Laurent polynomials on the circle to the rational case. That is to choose the poles of the ORF in an arbitrary order either inside or outside the unit disk. We relate the resulting ORF with the ORF having all their poles outside or all their poles inside the disk, and study the corresponding recurrence relations. With respect to this new orthogonal rational basis, the shift operator will be represented by a matrix Möbius transformation of a snake-shaped matrix. In the papers by Lasarow and coworkers (e.g., [23, 24, 25, 34]) matrix versions of the ORF are considered. In these papers also an arbitrary choice of the poles is allowed but only with the restrictive condition that if α is used as a pole, then 1/α cannot be used anymore. This means that for example the “balanced situation” is excluded. One of the goals of this paper is to remove this restriction on the poles. In the context of quadrature formulas, an arbitrary sequence of poles not on the unit circle was also briefly discussed in [19]. The sequence of poles considered there need not be New- tonian, i.e., the poles for the ORF of degree n may depend on n. Since our approach will emphasize the role of the recurrence relation for the ORF, we do need a Newtonian sequence, although some of the results may be generalizable to the situation of a non-Newtonian sequence of poles. One of the applications of the theory of ORF is the construction of quadrature formulas on the unit circle that form rational generalizations of the Szegő quadrature. They are exact in spaces of rational functions having poles inside and outside the unit disk. The nodes of the quadrature formula are zeros of para-orthogonal rational functions (PORF) and the weights are all positive numbers. These nodes and weights can (like in Gaussian quadrature) be derived from the eigenvalue decomposition of a unitary truncation of the shift operator to a finite-dimensional subspace. One of the results of the paper is that there is no gain in considering an arbitrary sequence of poles inside and outside the unit disk unless in a balanced situation. When all the poles are chosen outside the closed unit disk or when some of them are reflected in the circle, the same quadrature formula will be obtained. The computational effort for the general case will not increase but neither can it reduce the cost. In network applications or differential equations one often has to work with functions of large sparse matrices. If A is a matrix and the matrix function f(A) allows the Cauchy representation f(A) = ∫ Γ f(z)(z − A)−1dµ(z), where Γ is a contour encircling all the eigenvalues of A then numerical quadrature is a possible technique to obtain an approximation for f(A). If for exam- ple Γ is the unit circle, then expressions like u∗f(A)u for some vector u can be approximated by quadrature formulas discussed in this paper which will be implemented disguised as Krylov subspace methods (see for example [27, 29, 33]). Orthogonal Rational Functions on the Unit Circle 3 The purpose of the paper though is not to discuss quadrature in particular. It is just an example application that does not require much extra introduction of new terminology and notation. The main purpose however is to give a general framework on which to build for the many applications of ORFs. Just like orthogonal polynomials are used in about every branch of mathematics, ORFs can be used with the extra freedom to exploit the location of the poles. For example, it can be shown that the ORFs can be used to solve multipoint moment problems as well as more general rational interpolation problems where locations of the poles inside and outside the circle are important for the engineering applications like system identification, model reduction, filtering, etc. When modelling the transfer function of a linear system, poles should be chosen inside as well as outside the disk to guarantee that the transient as well as the steady state of the system is well modelled. It would lead us too far to also include the interpolation properties of multipoint Padé approximation and the related applications in several branches of engineering. We only provide the basics in this paper so that it can be used in the context of more applied papers. The interpretation of the recursion for the ORFs as a factorization of a matrix into elementary unitary transformations illustrates that the spectrum of the resulting matrix is independent of the order in which the elementary factors are multiplied. As far as we know, this fact was previously unknown in the linear algebra community, unless in particular cases like unitary Hessenberg matrices. As an illustration, we develop some preliminary results in Section 11 in a linear algebra setting that is slightly more general than the ORF situation. In the last decades, many papers appeared on inverse eigenvalue problems for unitary Hes- senberg matrices and rational Krylov methods. Some examples are [4, 30, 35, 36, 37, 38, 44]. These use elementary operations that are very closely related to the recurrence that will be discussed in this paper. However, they are not the same and often miss the flexibility discussed here. We shall illustrate some of these connections with certain algorithms from the literature in Section 12. The outline of the paper is as follows. In Section 2 we introduce the main notations used in this paper. The linear spaces and the ORF bases are given in Section 3. Section 4 brings the Christoffel–Darboux relations and the reproducing kernels which form an essential element to obtain the recurrence relation given in Section 5 but also for the PORF in Section 6 to be used for quadrature formulas in Section 7. The alternative representation of the shift operator is given in Section 8 and its factorization in elementary 2×2 blocks in the subsequent Section 9. We end by drawing some conclusions about the spectrum of the shift operator and about the computation of rational Szegő quadrature formulas in Section 10. The ideas that we present in this paper, especially the factorization of unitary Hessenberg matrices in elementary unitary factors is also used in the linear algebra literature mostly in the finite-dimensional situation. These elementary factors and what can be said about the spectrum of their product is the subject of Section 11. These elementary unitary transformations are intensively used in numerical algorithms such as Arnoldi-based Krylov methods where they are known as core transformations. Several variants of these rational Krylov methods exist. The algorithms are quite similar yet different from our ORF recursion as we explain briefly in Section 12 illustrating why we believe the version presented in this paper has superior advantages. 2 Basic definitions and notation We use the following notation. C denotes the complex plane, Ĉ the extended complex plane (one point compactification), R the real line, R̂ the closure of R in Ĉ, T the unit circle, D the open unit disk, D̂ = D ∪ T, and E = Ĉ \ D̂. For any number z ∈ Ĉ we define z∗ = 1/z (and set 1/0 =∞, 1/∞ = 0) and for any complex function f , we define f∗(z) = f(z∗). 4 A. Bultheel, R. Cruz-Barroso and A. Lasarow To approximate an integral Iµ(f) = ∫ T f(z)dµ(z), where µ is a probability measure on T one may use Szegő quadrature formulas. The nodes of this quadrature can be computed by using the Szegő polynomials. Orthogonality in this paper will always be with respect to the inner product 〈f, g〉 = ∫ T f(z)g(z)dµ(z). The weights of the n-point quadrature are all positive, the nodes are on T and the formula is exact for all Laurent polynomials f ∈ span{zk : |k| ≤ n− 1}. This has been generalized to rational functions with a set of predefined poles. The corre- sponding quadrature formulas are then rational Szegő quadratures. This has been discussed in many papers and some of the earlier results were summarized in the book [9]. We briefly recall some of the results that are derived there. The idea is the following. Fix a sequence α = (αk)k∈N with α = {αk}k∈N ⊂ D, and consider the subspaces of rational functions defined by L0 = C, Ln = { pn(z) πn(z) : pn ∈ Pn, πn(z) = n∏ k=1 (1− αkz) } , n ≥ 1, where Pn is the set of polynomials of degree at most n. These rational functions have their poles among the points in α∗ = {αj∗ = 1/αj : αj ∈ α}. We denote the corresponding sequence as α∗ = (αj∗)j∈N. Let φn ∈ Ln \ Ln−1, and φn ⊥ Ln−1 be the nth orthogonal rational basis function (ORF) in a nested sequence. It is well known that these functions have all their zeros in D (see, e.g., [9, Corollary 3.1.4]). However, the quadrature formulas we have in mind should have their nodes on the circle T. Therefore, para-orthogonal rational functions (PORF) are introduced. They are defined by Qn(z, τ) = φn(z) + τφ∗n(z), τ ∈ T, where besides the ORF φn(z) = pn(z) πn(z) , also the “reciprocal” function φ∗n(z) = p∗n(z) πn(z) = znpn∗(z) πn(z) is introduced. These PORF have n simple zeros {ξnk}nk=1 ⊂ T (see, e.g., [9, Theorem 5.2.1]) so that they can be used as nodes for the quadrature formulas In(f) = n∑ k=1 wnkf(ξnk) and the weights are all positive, given by wnk = 1/ n−1∑ j=0 |φj(ξnj)|2 (see, e.g., [9, Theorem 5.4.2]). These quadrature formulas are exact for all functions of the form {f = g∗h : g, h ∈ Ln−1} = Ln−1L(n−1)∗ (see, e.g., [9, Theorem 5.3.4]). The purpose of this paper is to generalize the situation where the αj are all in D to the situation where they are anywhere in the extended complex plane outside T. This will require the introduction of some new notation. So consider a sequence α with α ⊂ D and its reflection in the circle β = (βj)j∈N where βj = 1/αj = αj∗ ∈ E. We now construct a new sequence γ = (γj)j∈N where each γj is either equal to αj or βj . Partition {1, 2, . . . , n} (n ∈ N̂ = N∪{∞}) into two disjoint index sets: the ones where γj = αj and the indices where γj = βj : an = {j : γj = αj ∈ D, 1 ≤ j ≤ n} and bn = {j : γj = βj ∈ E, 1 ≤ j ≤ n}, Orthogonal Rational Functions on the Unit Circle 5 and define αn = {αj : j ∈ an} and βn = {βj : j ∈ bn}. It will be useful to prepend the sequence α with an extra point α0 = 0. That means that β is preceded by β0 = 1/α0 =∞. For γ, the initial point can be γ0 = α0 = 0 or γ0 = β0 =∞. With each of the sequences α, β, and γ we can associate orthogonal rational functions. They will be closely related as we shall show. The ORF for the γ sequence can be derived from the ORF for the α sequence by multiplying with a Blaschke product just like the orthogonal Laurent polynomials are essentially shifted versions of the orthogonal polynomials (see, e.g., [15]). To define the denominators of our rational functions, we introduce the following elementary factors: $α j (z) = 1− αjz, $β j (z) = { 1− βjz, if βj 6=∞, −z, if βj =∞, $γ j (z) = { $α j (z), if γj = αn, $β j (z), if γj = βn. Note that if αj = 0 and hence βj =∞ then $α j (z) = 1 but $β j (z) = −z. To separate the α and the β-factors in a product, we also define $̇α j (z) = { $α j , if γj = αj , 1, if γj = βj , and $̇β j (z) = { $β j , if γj = βj , 1, if γj = αj . Because the sequence γ is our main focus, we simplify the notation by removing the superscript γ when not needed, e.g., $j = $γ j = $̇α j $̇ β j etc. We can now define for ν ∈ {α, β, γ} πνn(z) = n∏ j=1 $ν j (z) and the reduced products separating the α and the β-factors π̇αn(z) = n∏ j=1 $̇α j (z) = ∏ j∈an $j(z), π̇βn(z) = n∏ j=1 $̇β j (z) = ∏ j∈bn $j(z), so that πn(z) = n∏ j=1 $j(z) = π̇αn(z)π̇βn(z). We assume here and in the rest of the paper that products over j ∈ ∅ equal 1. The Blaschke factors are defined for ν ∈ {α, β, γ} as ζνj (z) = σνj z − νj 1− νjz , σνj = νj |νj | , if νj 6∈ {0,∞}, ζνj (z) = σνj z = z, σνj = 1, if νj = 0, ζνj (z) = σνj /z = 1/z, σνj = 1, if νj =∞. Thus σνj =  νj |νj | , for νj 6∈ {0,∞}, 1, for νj ∈ {0,∞}. 6 A. Bultheel, R. Cruz-Barroso and A. Lasarow Because σαn = σβn , we can remove the superscript and just write σn. If we also use the following notation which maps complex numbers onto T u(z) =  z |z| ∈ T, z ∈ C \ {0}, 1, z ∈ {0,∞}, then σj = u(αj) = u(βj) = u(γj). Set ($ν j )∗(z) = $ν∗ j (z) = z$ν j∗(z) (e.g., (1 − αjz)∗ = z − αj if ν = α), then ζνj = σj $ν∗j $νj . Later we shall also use πν∗n to mean n∏ j=1 $ν∗ j . Note that ζαj = ζβj∗ = 1/ζβj . Moreover if αj = 0 and hence βj =∞, then $α∗ j (z) = z and $β∗ j (z) = −1. Next define the finite Blaschke products for ν ∈ {α, β} Bν 0 = 1, and Bν n(z) = n∏ j=1 ζνj (z), n = 1, 2, . . . . It is important to note that here ν 6= γ. For the definition of Bγ n = Bn see below. Like we have split up the denominators πn = π̇αn π̇ β n in the α-factors and the β-factors, we also define for n ≥ 1 ζ̇αj = { ζαj , if γj = αj , 1, if γj = βj , ζ̇βj = { ζβj , if γj = βj , 1, if γj = αj , and Ḃα n (z) = n∏ j=1 ζ̇αj (z) = ∏ j∈an ζj(z), and Ḃβ n(z) = n∏ j=1 ζ̇βj (z) = ∏ j∈bn ζj(z), so that we can define the finite Blaschke products for the γ sequence: Bn(z) = { Ḃα n (z), if γn = αn, Ḃβ n(z), if γn = βn. Note that the reflection property of the factors also holds for the products: Bα n = (Bβ n)∗ = 1/Bβ n , Bn∗ = 1/Bn, and (Ḃα n Ḃ β n)∗ = 1/(Ḃβ nḂα n ). However, Ḃα n = ∏ j∈an ζαj = ∏ j∈an ζβj∗ = ∏ j∈an 1/ζβj 6= ∏ j∈bn 1/ζβj = 1/Ḃβ n . 3 Linear spaces and ORF bases We can now introduce our spaces of rational functions for n ≥ 0: Lνn = span{Bν 0 , B ν 1 , . . . , B ν n}, ν ∈ {α, β, γ}, and L̇νn = span{Ḃν 0 , Ḃ ν 1 , . . . , Ḃ ν n}, ν ∈ {α, β}. The dimension of Lνn is n+ 1 for ν ∈ {α, β, γ}, but note that the dimension of L̇νn for ν ∈ {α, β} can be less than n+1. Indeed some of the Ḃν j may be repeated so that for example the dimension of L̇αn is only |an|+ 1 with |an| the cardinality of an and similarly for ν = β. Hence for ν = γ: Ln = span{B0, . . . , Bn} = span { Ḃ0, Ḃ α 1 , . . . , Ḃ α n , Ḃ β 1 , . . . , Ḃ β n } = L̇αn + L̇βn = L̇αnL̇βn. Orthogonal Rational Functions on the Unit Circle 7 Because for n ≥ 1 Ḃα n = ∏ j∈an ζαj = ∏ j∈an 1 ζβj and Ḃβ n = ∏ j∈bn ζβj = ∏ j∈bn 1 ζαj , it should be clear that Bα k = Ḃα k /Ḃ β k and Bβ k = Ḃβ k /Ḃ α k , hence that Lαn = span { Ḃ0, Ḃα 1 Ḃβ 1 , . . . , Ḃα n Ḃβ n } and Lβn = span { Ḃ0, Ḃβ 1 Ḃα 1 , . . . , Ḃβ n Ḃα n } . Occasionally we shall also need the notation ς̇αn = ∏ j∈an σj ∈ T, ς̇βn = ∏ j∈bn σj ∈ T, and ςn = n∏ j=1 σj ∈ T. Lemma 3.1. If f ∈ Ln then f/Ḃβ n ∈ Lαn and f/Ḃα n ∈ L β n. In other words Ln = Ḃβ nLαn = Ḃα nL β n. This is true for all n ≥ 0 if we set Ḃα 0 = Ḃβ 0 = 1. Proof. This is trivial for n = 0 since then Ln = C. If f ∈ Ln, and n ≥ 1 then it is of the form f(z) = pn(z) πn(z) = pn(z) π̇αn(z)π̇βn(z) , pn ∈ Pn. Therefore f(z) Ḃβ n(z) = ς̇ β n pn(z)π̇βn(z) π̇αn(z)π̇βn(z)π̇β∗n (z) = ς̇ β n pn(z) π̇αn(z)π̇β∗n (z) . Recall that $β∗ j = −1 and σj = 1 if βj =∞ (and hence αj = 0), we can leave these factors out and we shall write · ∏ for the product instead of ∏ , the dot meaning that we leave out all the factors for which αj = 1/βj = 0. ς̇ β n π̇β∗n (z) = · ∏ j∈bn βj |βj |(z − βj) = · ∏ j∈bn |αj | αj(z − 1/αj) = · ∏ j∈bn −|αj | 1− αjz , and thus f(z) Ḃβ n(z) = cn pn(z) n∏ j=1 (1− αjz) ∈ Lαn, cn = · ∏ j∈bn (−|αj |) 6= 0. The second part is similar. � Lemma 3.2. With our previous definitions we have for n ≥ 1 Ḃβ nLαn−1 = span { Bα k Ḃ β n = Ḃα k Ḃβ k Ḃβ n : k = 0, . . . , n− 1 } = ζ̇βn span{B0, B1, . . . , Bn−1} = ζ̇βnLn−1, and similarly Ḃα nL β n−1 = span { Bβ k Ḃ α n = Ḃβ k Ḃα k Ḃα n : k = 0, . . . , n− 1 } = ζ̇αn span{B0, B1, . . . , Bn−1} = ζ̇αnLn−1. 8 A. Bultheel, R. Cruz-Barroso and A. Lasarow Proof. By our previous lemma Ḃβ nLαn−1 = ζ̇βn Ḃ β n−1Lαn−1 = ζ̇βnLn−1. The second relation is proved in a similar way. � To introduce the sequences of orthogonal rational functions (ORF) for the different sequen- ces ν, ν ∈ {α, β, γ} recall the inner product that we can write with our ( )∗-notation as 〈f, g〉 =∫ T f∗(z)g(z)dµ(z) where µ is assumed to be a probability measure positive a.e. on T. Then the orthogonal rational functions (ORF) with respect to the sequence ν with ν ∈ {α, β, γ} are defined by φνn ∈ Lνn \ Lνn−1 with φνn ⊥ Lνn−1 for n ≥ 1 and we choose φν0 = 1. Lemma 3.3. The function φαnḂ β n belongs to Ln and it is orthogonal to the n-dimensional sub- space ζ̇βnLn−1 for all n ≥ 1. Similarly, the function φβnḂα n belongs to Ln and it is orthogonal to the n-dimensional subspace ζ̇αnLn−1, n ≥ 1. Proof. First note that φαnḂ β n ∈ Ln by Lemma 3.1. By definition φαn ⊥ Lαn−1. Thus by Lemma 3.2 and because 〈f, g〉 = 〈 Ḃν nf, Ḃ ν ng 〉 , Ḃβ nφ α n ⊥ Ḃβ nLαn−1 = ζ̇βnLn−1. The second claim follows by symmetry. � Note that ζ̇βnLn−1 = Ln−1 if γn = αn. Thus, up to normalization, φαnḂ β n is the same as φn and similarly, if γn = βn then φn and φβnḂα n are the same up to normalization. Lemma 3.4. For n ≥ 1 the function Ḃα n (φαn)∗ belongs to Ln and it is orthogonal to ζ̇αnLn−1. Similarly, for n ≥ 1 the function Ḃβ n(φβn)∗ belongs to Ln and it is orthogonal to ζ̇βnLn−1. Proof. Since φαnḂ β n ⊥ ζ̇βnLn−1, (φαnḂ β n)∗ ⊥ ζ̇βn∗L(n−1)∗, and thus by Lemma 3.2 and because Ḃα n−1Ḃ β n−1L(n−1)∗ = Ḃα n−1Ḃ β n−1 P(n−1)∗ π̇α(n−1)∗π̇ β (n−1)∗ = Pn−1 π̇αn−1π̇ β n−1 = Ln−1 it follows that Ḃα nφ α n∗ = Ḃα n Ḃ β n(φαnḂ β n)∗ ⊥ ζ̇αn Ḃα n−1Ḃ β n−1L(n−1)∗ = ζ̇αnLn−1. The other claim follows by symmetry. � We now define the reciprocal ORFs by (recall f∗(z) = f(1/z)) (φνn)∗ = Bν n(φνn)∗, ν ∈ {α, β}. For the ORF in Ln however we set φ∗n = Ḃα n Ḃ β n(φn)∗. Note that by definition Bn is either Ḃα n or Ḃβ n depending on γn being αn or βn, while in the previous definition we do not multiply with Bn but with the product Ḃα n Ḃ β n . The reason is that we want the operation ( )∗ to be a map from Lνn to Lνn for all ν ∈ {α, β, γ}. Orthogonal Rational Functions on the Unit Circle 9 Remark 3.5. As the operation ( )∗ is a map from Lνn to Lνn, it depends on n and on ν. So to make the notation unambiguous we should in fact use something like f [ν,n] if f ∈ Lνn. However, in order not to overload our notation, we shall stick to the notation f∗ since it should always be clear from the context what the space is to which f will belong. Note that we also used the same notation to transform polynomials. This is just a special case of the general definition. Indeed, a polynomial of degree n belongs to Lαn for a sequence α where all αj = 0, j = 0, 1, 2, . . . and for this sequence Bα n (z) = zn. Note that for a constant a ∈ L0 = Ĉ we have a∗ = a. Although ( )∗ is mostly used for scalar expressions, we shall occasionally use A∗ where A is a matrix whose elements are all in Ln. Then the meaning is that we take the ( )∗ conjugate of each element in its transpose. Thus if A is a constant matrix, then A∗ has the usual meaning of the adjoint or complex conjugate transpose of the matrix. We shall need this in Section 8. Remark 3.6. It might also be helpful for the further computations to note the following. If pn is a polynomial of degree n with a zero at ξ, then p∗n will have a zero at ξ∗ = 1/ξ. Hence, if ν ∈ {α, β, γ} and φνn = pνn πνn , then φνn∗ = pνn∗ πνn∗ = pν∗n πν∗n . We know by [9, Corollary 3.1.4] that φαn has all its zeros in D, hence pαn does not vanish in E and pα∗n does not vanish in D. By symmetry φβn has all its zeros in E and pβ∗n does not vanish in E. For the general φn, it depends on γn being αn or βn. However from the relations between φn and φαn or φβn that will be derived below, we will be able to guarantee that at least for z = νn we have φν∗n (νn) 6= 0 and pν∗n (νn) 6= 0 for all ν ∈ {α, β, γ} (see Corollary 3.11 below). The orthogonality conditions define φn and φ∗n uniquely up to normalization. So let us now make the ORFs unique by imposing an appropriate normalization. First assume that from now on the φνn refer to orthonormal functions in the sense that ‖φνn‖ = 1. This makes them unique up to a unimodular constant. Defining this constant is what we shall do now. Suppose γn = αn, then φn and φαnḂ β n are both in Ln and orthogonal to Ln−1 (Lemma 3.3). If we assume ‖φn‖ = 1 and ‖φαn‖ = 1, hence ‖φαnḂ β n‖ = ‖φαn‖ = 1, it follows that there must be some unimodular constant sαn ∈ T such that φn = sαnφ α nḂ β n . Of course, we have by symmetry that for γn = βn, there is some sβn ∈ T such that φn = sβnφ β nḂα n . To define the unimodular factors sαn and sβn, we first fix φαn and φβn uniquely as follows. We know that φαn has all its zeros in D and hence φα∗n has all its zeros in E so that φα∗n (αn) 6= 0. Thus we can take φα∗n (αn) > 0 as a normalization for φαn. Similarly for φβn we can normalize by φβ∗n (βn) > 0. In both cases, we have made the leading coefficient with respect to the basis {Bν j }nj=0 positive since φαn(z) = φα∗n (αn)Bα n (z) + ψαn−1(z) with ψαn−1 ∈ Lαn−1 and φβn(z) = φβ∗n (βn)Bβ n(z) + ψβn−1(z) with ψβn−1 ∈ L β n−1. Before we define the normalization for the γ sequence, we prove the following lemma which is a consequence of the normalization of the φαn and the φβn. Lemma 3.7. For the orthonormal ORFs, it holds that φαn = φβn∗ and (φαn)∗Ḃβ n = φβnḂα n and hence also (φβn)∗Ḃα n = φαnḂ β n for all n ≥ 0. Proof. For n = 0, this is trivial since φ0, φ α 0 , φ β 0 , Ḃ α 0 and Ḃβ 0 are all equal to 1. We give the proof for n ≥ 1 and γn = αn (for γn = βn, the proof is similar). Since by previous lemmas Ḃβ n(φβn)∗ and φαnḂ β n are both in Ln and orthogonal to Ln−1, and since ‖Ḃβ n(φβn)∗‖ = ‖φβn‖ = 1 and ‖φαnḂ β n‖ = ‖φαn‖ = 1, there must be some sn ∈ T such that snφ α nḂ β n = φβn∗Ḃ β n or snφ α n = φβn∗. 10 A. Bultheel, R. Cruz-Barroso and A. Lasarow Multiply with Bβ n = Bα n∗ and evaluate at βn to get snφ α n(βn)Bα n∗(βn) = φβ∗n (βn) > 0. Thus sn should arrange for 0 < snφ α n(1/αn)Bα n∗(1/αn) = snφαn∗(αn)Bα n (αn) = snφα∗n (αn), and since φα∗n (αn) > 0, it follows that sn = 1. Because (φαn)∗ = Bα nφ α n∗ = Bα nφ β n and Bα n = Ḃα n/Ḃ β n , also the other claims follow. � For the normalization of the φn, we can do two things: either we make the normalization of φn simple and choose for example φ∗n(γn) > 0, similar to what we did for φαn and φβn (but this is somewhat problematic as we shall see below), or we can insist on keeping the relation with φαn and φβn simple as in the previous lemma and arrange that sαn = sβn = 1. We choose for the second option. Let us assume that γn = αn. Denote φn(z) = pn(z) π̇αn(z)π̇βn(z) and φαn(z) = pαn(z) παn(z) , with pn and pαn both polynomials in Pn. Then φ∗n(z) = ςn p ∗ n(z) π̇αn(z)π̇βn(z) and φα∗n (z) = ςn p α∗ n (z) παn(z) , ςn = n∏ j=1 σj . We already know that there is some sαn ∈ T such that φn = sαnḂ β nφαn. Take the ( )∗ conjugate and multiply with Ḃα n Ḃ β n to get φ∗n = sαnḂ β nφα∗n . It now takes some simple algebra to reformulate φ∗n = sαnḂ β nφα∗n as φ∗n(z) = ςn p ∗ n(z) π̇αn(z)π̇βn(z) = sαn ςn p α∗ n (z) π̇αn(z)π̇βn(z) · ∏ j∈bn (−|βj |). This implies that p∗n(z) = sαnp α∗ n (z) · ∏ j∈bn(−|βj |) and thus that p∗n(z) has the same zeros as pα∗n (z), none of which is in D. Thus the numerator of φ∗n will not vanish at αn ∈ D but one of the factors (1 − βjαn) from π̇βn(αn) could be zero. Thus a normalization φ∗n(αn) > 0 is not an option in general. We could however make sαn = 1 when we choose p∗n(αn)/pα∗n (αn) > 0 or, since φα∗n (αn) > 0, this is equivalent with ςn p ∗ n(αn)/παn(αn) > 0. Yet another way to put this is requiring that φ∗n(z)/Ḃβ n(z) is positive at z = αn. This does not give a problem with 0 or ∞ since Ḃα n (z)φn∗(z) = φ∗n(z) Ḃβ n(z) = ς̇αn p ∗ n(z) π̇αn(z) · ∏ j∈bn(z − βj) , ς̇αn = ∏ j∈an σj . (3.1) It is clear that neither the numerator nor the denominator will vanish for z = αn. Of course a similar argument can be given if γn = βn. Then we choose φ∗n(z)/Ḃα n (z) to be positive at z = βn or equivalently ςn p ∗ n(βn)/πβn(βn) · ∏ j∈an(−|αj |) > 0. Let us formulate the result about the numerators as a lemma for further reference. Lemma 3.8. With the normalization that we just imposed the numerators pνn of φνn = pνn/π ν n, ν ∈ {α, β, γ} and n ≥ 1 are related by pn(z) = pαn(z) · ∏ j∈bn (−|βj |) = pβ∗n (z)ςn · ∏ j∈an (−|αj |), if γn = αn Orthogonal Rational Functions on the Unit Circle 11 and pn(z) = pβn(z) · ∏ j∈an (−|αj |) = pα∗n (z)ςn · ∏ j∈bn (−|βj |), if γn = βn, where as before ςn = ∏n j=1 σj. Proof. The first expression for γn = αn has been proved above. The second one follows in a similar way from the relation φn(z) = φβ∗n (z)Ḃα n (z). Indeed pn(z) πn(z) = ςnp β∗ n (z)∏ j∈an $β j (z) ∏ j∈bn $β j (z) ∏ j∈an σj z − αj 1− αjz = ςnp β∗ n (z)∏ j∈an $α j (z) ∏ j∈bn $β j (z) ∏ j∈an σj z − αj 1− βjz = ςnp β∗ n (z) πn(z) · ∏ j∈an σj(−αj) z − αj z − αj . With −σjαj = −|αj | the result follows. The case γn = βn is similar. � Note that this normalization again means that we take the leading coefficient of φn to be pos- itive in the following sense. If γn = αn then φn(z) = (Ḃα nφn∗)(αn)Ḃα n (z) + ψn−1(z) with ψn−1 ∈ Ln−1, while Ḃα nφn∗ = φα∗n and φα∗n (αn) > 0. If γn = βn then φn(z) = (Ḃβ nφn∗)(βn)Ḃβ n(z) + ψn−1(z) with ψn−1 ∈ Ln−1 and the conclusion follows similarly. Whenever we use the term orthonormal, we assume this normalization and {φn : n = 0, 1, 2, . . .} will denote this orthonormal system. Thus we have proved the following Theorem. It says that if γn = αn, then φn is a ‘shifted’ version of φαn where ‘shifted’ means multiplied by Ḃβ n : Ḃβ n(z)φn(z) = Ḃβ n(z)[a0B α 0 + · · ·+ anB α n (z)] = a0Ḃ β n(z) + · · ·+ anḂ α n (z), and a similar interpretation if γn = βn. We summarize this in the following theorem. Theorem 3.9. Assume all ORFs φνn, ν ∈ {α, β, γ} are orthonormal with positive leading coef- ficient, i.e., φα∗n (αn) > 0 and φβ∗n (βn) > 0 and { (φ∗n/Ḃ β n)(αn) > 0 if γn = αn, (φ∗n/Ḃ α n )(βn) > 0 if γn = βn. Then for all n ≥ 0 φn = (φαn)Ḃβ n = (φβn)∗Ḃα n and φ∗n = (φαn)∗Ḃβ n = (φβn)Ḃα n if γn = αn, while φn = (φβn)Ḃα n = (φαn)∗Ḃβ n and φ∗n = (φβn)∗Ḃα n = (φαn)Ḃβ n if γn = βn. Corollary 3.10. We have for all n ≥ 1 that (φνn)∗ ⊥ ζνnLνn−1, ν ∈ {α, β, γ}. Corollary 3.11. The rational functions φαn and φα∗n are in Lαn and hence have all their poles in {βj : j = 1, . . . , n} ⊂ E while the zeros of φαn are all in D and the zeros of φα∗n are all in E. The rational functions φβn and φβ∗n are in Lβn and hence have all their poles in {αj : j = 1, . . . , n} ⊂ D while the zeros of φβn are all in E and the zeros of φβ∗n are all in D. The rational functions φn and φ∗n are in Ln and hence have all their poles in {βj : j ∈ an} ∪ {αj : j ∈ bn}. The zeros of φn are the same as the zeros of φαn and thus are all in D if γn = αn and they are the same as the zeros of φβn and thus they are all in E if γn = βn. 12 A. Bultheel, R. Cruz-Barroso and A. Lasarow Proof. It is well known that the zeros of φαn are all in D [9, Corollary 3.1.4], and because φβn = φαn∗, this means that the zeros of φβn are all in E. Because φn = (φαn)Ḃβ n = (φαn)/ ∏ j∈bn ζ α j if γn = αn, i.e., n ∈ an, and the product with Ḃβ n will only exchange the poles 1/αj = βj , j ∈ bn in φn for poles αj = 1/βj , the zeros of φαn are left unaltered. The proof for n ∈ bn is similar. � One may summarize that for f ∈ Lνn the f∗ transform reflects both zeros and poles in T since z 7→ z∗ = 1/z, while the transform f → f∗ as it is defined in the spaces Lνn, ν ∈ {α, β, γ}, keeps the poles but reflects the zeros since the multiplication with the respective factors Bα n , Bβ n and Ḃα n Ḃ β n will only undo the reflection of the poles that resulted from the f∗ operation. 4 Christoffel–Darboux relations and reproducing kernels For ν ∈ {α, β, γ}, one may define the reproducing kernels for the space Lνn. Given the ORF φνk, the kernels are defined by kνn(z, w) = n∑ k=0 φνk(z)φνk(w). They reproduce f ∈ Lνn by 〈kνn(·, z), f〉 = f(z). The proof of the Christoffel–Darboux relations goes exactly like in the classical case and we shall not repeat it here (see, e.g., [9, Theorem 3.1.3]). Theorem 4.1. The Christoffel–Darboux relations kνn(z, w) = φν∗n (z)φν∗n (w)− ζνn(z)ζνn(w)φνn(z)φνn(w) 1− ζνn(z)ζνn(w) = φν∗n+1(z)φν∗n+1(w)− φνn+1(z)φνn+1(w) 1− ζνn+1(z)ζνn+1(w) hold for n ≥ 0, ν ∈ {α, β, γ} and z, w not among the poles of φνn and not on T. As an immediate consequence we have: Theorem 4.2. The following relations hold true: kαn(z, w)Ḃβ n(z)Ḃβ n(w) = kn(z, w) = kβn(z, w)Ḃα n (z)Ḃα n (w) for n ≥ 0 and z, w 6∈ (T ∪ {βj : j ∈ an} ∪ {αj : j ∈ bn}). Proof. The first relation was directly shown above for the case γn = αn. It also follows in the case γn+1 = αn+1 and using in the second CD-relation the first expressions from Theorem 3.9 for φn+1 and φ∗n+1. The relation is thus valid independent of γn = αn or γn = βn. Similarly the second expression was derived before in the case γn = βn, but again, it also fol- lows from the second CD-relation and the first expressions from Theorem 3.9 for φn+1 and φ∗n+1 in the case γn+1 = βn+1. Again the relation holds independently of γn = αn or γn = βn. Alternatively, the second relation can also be derived from the second CD-relation in the case γn+1 = αn+1 but using the second expressions from Theorem 3.9 for φn+1 and φ∗n+1. � Evaluation of the CD-relation in νn for ν ∈ {α, β} results in another useful corollary. Orthogonal Rational Functions on the Unit Circle 13 Corollary 4.3. For ν ∈ {α, β} we have for n ≥ 0 kνn(z, νn) = φν∗n (z)φν∗n (νn) and kνn(νn, νn) = |φν∗n (νn)|2. The latter corollary cannot immediately be used when ν = γ because γn could be equal to some pole of φn if it equals some 1/γj for j < n. In that case we can remove the denominators in the CD relation and only keep the numerators. Hence setting kn(z, w) = Cn(z, w) πn(z)πn(w) , φn(z) = pn(z) πn(z) , φ∗n(z) = ςnp ∗ n(z) πn(z) , ςn ∈ T, the CD relation becomes Cn(z, w) = p∗n(z)p∗n(w)− ζn(z)ζn(w)pn(z)pn(w) 1− ζn(z)ζn(w) = p∗n+1(z)p∗n+1(w)− pn+1(z)pn+1(w) (1− ζn+1(z)ζn+1(w))$n+1(z)$n+1(w) . (4.1) Thus, the first form gives Cn(z, γn) = p∗n(z)p∗n(γn) and Cn(γn, γn) = |p∗n(γn)|2. Evaluating a polynomial at infinity means taking its highest degree coefficient, i.e., if qn(z) is a polynomial of degree n, then qn(∞) stands for its coefficient of zn. The second form of (4.1) gives for γn+1 6=∞ and γn 6=∞ Cn(z, γn) = p∗n+1(z)p∗n+1(γn)− pn+1(z)pn+1(γn) (1− γnz)(1− |γn+1|2) and Cn(γn, γn) = |p∗n+1(γn)|2 − |pn+1(γn)|2 (1− |γn|2)(1− |γn+1|2) . Coupling the first and the second form in (4.1) gives |p∗n+1(γn)|2 − |pn+1(γn)|2 (1− |γn|2)(1− |γn+1|2) = |p∗n(γn)|2. For γn+1 =∞ and γn 6=∞ we get Cn(z, γn) = p∗n+1(z)p∗n+1(γn)− pn+1(z)pn+1(γn) −(1− γnz) = p∗n(z)p∗n(γn) and Cn(γn, γn) = |p∗n+1(γn)|2 − |pn+1(γn)|2 −(1− |γn|2) = |p∗n(γn)|2. If γn+1 =∞ and γn =∞, the denominators in (4.1) have to be replaced by 1, which gives Cn(z, γn) = p∗n+1(z)p∗n+1(γn)− pn+1(z)pn+1(γn) = p∗n(z)p∗n(γn) and Cn(γn, γn) = |p∗n+1(γn)|2 − |pn+1(γn)|2 = |p∗n(γn)|2. 14 A. Bultheel, R. Cruz-Barroso and A. Lasarow For γn =∞ and γn+1 6=∞ we obtain in a similar way Cn(z, γn) = p∗n+1(z)p∗n+1(γn)− pn+1(z)pn+1(γn) z(1− |γn+1|2) = p∗n(z)p∗n(γn) and Cn(γn, γn) = |p∗n+1(γn)|2 − |pn+1(γn)|2 (1− |γn+1|2) = |p∗n(γn)|2. To summarize, the relations of Corollary 4.3 may not hold for the ORF if ν = γ, but similar relations do hold for the numerators as stated in the next corollary. Corollary 4.4. If Cn(z, w) is the numerator in the CD relation and pn(z) is the numerator of the ORF φn for the sequence γ then we have for n ≥ 0 Cn(z, γn) = p∗n(z)p∗n(γn) and Cn(γn, γn) = |p∗n(γn)|2. 5 Recurrence relation The recurrence for the φαn is well known. For a proof see, e.g., [9, Theorem 4.1.1]. For φβn the proof can be copied by symmetry. However, also for ν = γ the same recurrence and its proof can be copied, with the exception that the derivation fails when p∗n(γn−1) = 0 where pn = φnπn. This can (only) happen if (1− |γn|)(1− |γn−1|) < 0 (i.e., one of these γ’s is in D and the other is in E). We shall say that φn is regular if p∗n(γn−1) 6= 0. If ν = α or ν = β then the whole sequence (φνn)n≥0 will be automatically regular. Thus we have the following theorem: Theorem 5.1. Let ν ∈ {α, β, γ} and if ν = γ assume moreover that φνn is regular, then the following recursion holds with initial condition φν0 = φν∗0 = 1[ φνn(z) φν∗n (z) ] = Hν n $ν n−1(z) $ν n(z) [ 1 λνn λ ν n 1 ] [ ζνn−1(z) 0 0 1 ] [ φνn−1(z) φν∗n−1(z) ] , where Hν n is a nonzero constant times a unitary matrix: Hν n = eνn [ ηνn1 0 0 ηνn2 ] , eνn ∈ C \ {0}, ηνn1, η ν n2 ∈ T. The constant ηνn1 is chosen such that the normalization condition for the ORFs is maintained. The other constant ηνn2 is then automatically related to ηνn1 by ηνn2 = ηνn1σn−1σn. The Szegő parameter λνn is given by λνn = ηνn pνn(νn−1) pν∗n (νn−1) with ηνn = ςn−2 pν∗n−1(νn−1) pν∗n−1(νn−1) ∈ T, where φνn(z) = pνn(z)/πνn(z). Proof. We immediately concentrate on the general situation ν = γ. Of course ν = α and ν = β will drop out as special cases. For simplicity assume that γn and γn−1 are not 0 and not ∞. The technicalities when this is not true are left as an exercise. It is easy because formally it follows the steps of the proof below but one has to replace a linear factor involving infinity by the coefficient of infinity (like 1 − ∞z = −z and z − ∞ = −1) and evaluating a polynomial at ∞ means taking its leading coefficient. Orthogonal Rational Functions on the Unit Circle 15 First we show that there are some numbers cn and dn such that φ(z) := 1− γnz z − γn−1 φn(z)− dnφn−1(z)− cn 1− γn−1z z − γn−1 φ∗n−1 ∈ Ln−2. This can be written as N(z)/[(z − γn−1)πn−1(z)]. Thus the cn and dn are defined by the conditions N(γn−1) = N(1/γn−1) = 0. If we denote φk = pk/πk and thus φ∗k = p∗kςk/πk, it is clear that N(z) = pn(z)− dn(z − γn−1)pn−1(z)− cn(1− γn−1z)p ∗ n−1(z)ςn−1. Thus the first condition gives cn = ςn−1pn(γn−1) (1− |γn−1|2)p∗n−1(γn−1) and the second one dn = pn(1/γn−1) (1/γn−1 − γn−1)pn−1(1/γn−1) = p∗n(γn−1) (1− |γn−1|2)p∗n−1(γn−1) . Note that p∗n−1(γn−1) cannot be zero by Corollary 3.11, and that also p∗n(γn−1) does not vanish by our assumption of regularity. Furthermore, by using the orthognality of φn and φn−1 and Corollary 3.10, it is not difficult to show that φ ⊥ Ln−2 so that it must be identically zero. Thus φn(z) = dnσn−1 1− γn−1z 1− γnz [ζn−1(z)φn−1(z) + λnφ ∗ n−1(z)], with λn = ηn pn(γn−1) p∗n(γn−1) , ηn = ςn−1σn−1 p∗n−1(γn−1) p∗n−1(γn−1) . By taking the ( )∗ transform (in Ln) we obtain φ∗n(z) = dnσn 1− γn−1z 1− γnz [λnζn−1(z)φn−1(z) + φ∗n−1(z)]. This proves the recurrence by taking en = |dn| and ηn1 = σn−1u(dn). It remains to show that the initial step for n = 1 is true. Since φ0 = φ∗0 = 1, then in case γ0 = α0 = 0, hence ζ0 = z, we have φ1(z) = e1η11 z + λ1 1− γ1z and φ∗1(z) = e1η12 λ1z + 1 1− γ1z . Thus p1(z) = e1η11(z + λ1) and p∗1(z) = e1η11(λ1z + 1). This implies that λ1 is indeed given by the general formula because λ1 = η1 p1(γ0) p∗1(γ0) = p1(0) p∗1(0) = e1η11λ1 e1η11 . 16 A. Bultheel, R. Cruz-Barroso and A. Lasarow In case γ0 = β0 =∞, then ζ0 = 1/z, so that φ1(z) = e1η11 −1− λ1z 1− γ1z and φ∗1(z) = e1η12 −λ1 − z 1− γ1z , and thus p1(z) = −e1η11(1 + λ1z) and p∗1(z) = −e1η11σ1(λ1 + z), and again λ1 is given by the general formula λ1 = η1 p1(γ0) p∗1(γ0) = 1 p1(∞) p∗1(∞) = e1η11λ1 e1η11 . This proves the theorem. � Remark 5.2. If ν ∈ {α, β} we could rewrite λνn in terms of φνn because by dividing and multi- plying with the appropriate denominators πνn one gets λνn = ηνn φνn(νn−1) φν∗n (νn−1) , ηνn = σn−1σn (1− νnνn−1) (1− νnνn−1) φν∗n−1(νn−1) φν∗n−1(νn−1) , n ≥ 1. Note that also this ηνn ∈ T, but it differs from the ηνn in the previous theorem. However if ν = γ, then this expression has the disadvantage that γn−1 could be equal to 1/γn or it could be equal to a pole of φn in which case it would not make sense to evaluate these expressions in γn−1. The latter expressions only make sense if we interpret them as limiting values φνn(νn−1) φν∗n (νn−1) = lim z→νn−1 φνn(z) φν∗n (z) and (1− νnνn−1) (1− νnνn−1) φν∗n−1(νn−1) φν∗n−1(νn−1) = lim z→νn−1 (1− νnz) (1− νnz) φν∗n−1(z) φν∗n−1(z) , where one has to assume that lim ξ→0 [ξ/ξ] = 1. We shall from now on occasionally use these expressions with this interpretation, but the expressions for λνn from Theorem 5.1 using the numerators are more direct since they immediately give the limiting values. Note that λαn is the value of a Blaschke product with all its zeros in D evaluated at αn−1 ∈ D and therefore λαn ∈ D. Similarly, λβn is the value of a Blaschke product with all its zeros in E, evaluated at βn−1 ∈ E so that λβn ∈ D. Since the zeros of φn are the zeros of φαn if n ∈ an and they are the zeros of φβn if n ∈ bn, it follows that if n and n − 1 are both in an or both in bn, then λn ∈ D but if n ∈ an and n− 1 ∈ bn or vice versa, then λn ∈ E. Therefore (eνn)2 = 1− |νn|2 1− |νn−1|2 1 1− |λνn|2 > 0 (5.1) and we can choose en as the positive square root of this expression. The above expression is derived in [9, Theorem 4.1.2] for the case ν = α by using the CD relations. By symmetry, this also holds for ν = β. For ν = γ, the same relation can be obtained by stripping the denominators as we explained after the proof of the CD-relation in Section 4. What goes wrong with the recurrence relation when φn is not regular? From the proof of Theorem 5.1, it follows that then dn = 0. We still have the relation φn(z) = σn−1 (1− |γn−1|2)p∗n−1(γn−1) [ p∗n(γn−1)ζn−1(z)φn−1(z) + sn−1pn(γn−1)φ∗n−1(z) ] Orthogonal Rational Functions on the Unit Circle 17 with sn−1 = ςn−1p∗n−1(γn−1) σn−1p∗n−1(γn−1) ∈ T and p∗n(γn−1) = 0. Thus, there is some positive constant en and some ηn1 ∈ T such that φn(z) = enηn1 $n−1(z) $n(z) [ 0 ζn−1(z)φn−1(z) + φ∗n−1(z) ] , i.e., the first term in the sum between square brackets vanishes. Applying Theorem 5.1 in this case would give λn = ∞, and the previous relations show that we only have to replace in Theorem 5.1 the matrix[ 1 λνn λνn 1 ] by [ 0 1 1 0 ] . This is in line with how we have dealt with ∞ so far where the rule of thumb was to set a−νb = −b if ν =∞. So let us therefore also use Theorem 5.1 with this interpretation when φn is not regular and thus λn = ∞. With the expressions at the end of Section 4, it can also be shown that in this case e2 n = − 1− |γn|2 1− |γn−1|2 > 0. Note that this corresponds to replacing 1 − |λn|2 when λn = ∞ by −1. Since this non-regular situation can only occur when (1−|γn|)(1−|γn−1|) < 0, this expression for e2 n is indeed positive. A similar rule can be applied if γn or γn−1 is infinite, just replace in this or previous expression 1− |∞|2 by −1. The positivity of the expressions for e2 n also follows from the following result. Theorem 5.3. The Szegő parameters satisfy for all n ≥ 1: If γn = αn and γn−1 = αn−1 then λn = λαn = λβn ∈ D. If γn = βn and γn−1 = βn−1 then λn = λβn = λαn ∈ D. If γn = αn and γn−1 = βn−1 then λn = 1/λβn = 1/λαn ∈ E. If γn = βn and γn−1 = αn−1 then λn = 1/λαn = 1/λβn ∈ E. Proof. Suppose γn = αn and γn−1 = αn−1, then by Theorems 5.1 and 3.9, or better still by Lemma 3.8, λn = ( ςn−2 p∗n−1(αn−1) p∗n−1(αn−1) ) pn(αn−1) p∗n(αn−1) = ( ςn−2 pα∗n−1(αn−1) pα∗n−1(αn−1) ) pαn(αn−1) pα∗n (αn−1) = λαn. When using pαn(z) = ςn p β∗ n (z) · ∏n j=1(−|αj |) and αj = 1/βj , the previous relation becomes λn = ςn−2 ςn−1p β n−1(1/βn−1) ςn−1p β n−1(1/βn−1)  pβ∗n (1/βn−1) pβn(1/βn−1) = ( σ2 n−1ςn−2 pβ∗n (βn−1) pβ∗n (βn−1) β n−1 n−1 βn−1 n−1 ) pβn(βn−1) pβ∗n (βn−1) βnn−1 β n n−1 = σ2 n−1 βn−1 βn−1 λβn = λβn. The proof for γn = βn and γn−1 = βn−1, n ≥ 1 is similar. Next consider γn = αn and γn−1 = βn−1, then λn = ςn−2 pβ∗n−1(βn−1) pβ∗n−1(βn−1)  pαn(βn−1) pα∗n (βn−1) = ηβn pβ∗n (βn−1) pβn(βn−1) = ηβnη β n λβn = 1 λβn = 1 λαn . The remaining case γn = βn and γn−1 = αn−1, is again similar. � 18 A. Bultheel, R. Cruz-Barroso and A. Lasarow Remark 5.4. It should also be clear that the expression for the parameters λνn of Theorem 5.1 are for theoretical use only. They are expressed in terms of pνn, which is the numerator of φνn, the object that should be the result of the computation, and hence unknown at the moment of computing λνn. For practical use, these parameters λνn should be obtained in a different way. In inverse eigenvalue problems the recursion is used backwards, i.e., for decreasing degrees of the ORFs and then these expressions can be used of course. Even if we know the λνn, then in Theorem 5.1, we still need the normalizing factor ηνn1 ∈ T which is characterized by “ηνn1 is chosen such that the normalization condition or the ORFs is maintained”. We could compute φνn with ηνn1 = 1, check the value of φν∗n (γn−1), and derive ηνn1 from it. What this means is shown in the next lemma. Lemma 5.5. For ν ∈ {α, β}, the phase θνn of the unitary factor ηνn1 = eiθ ν n is given by θνn = arg ( σn−1σn$ν n(νn−1)φν∗n (νn−1) ) or equivalently ηνn1 = σn−1σnu ( $ν n(νn−1)φν∗n (νn−1) ) . (Recall u(z) = z/|z|.) Proof. Take the first relation of Theorem 5.1 and evaluate for z = νn−1, then because $ν∗ n−1(νn−1) = 0 we get φνn(νn−1)$ν n(νn−1) = eνnη ν n1$ ν n−1(νn−1)λνnφ ν∗ n−1(νn−1) or ηνn1 = $ν n(νn−1)φνn(νn−1) eνn$ ν n−1(νn−1)λνnφ ν∗ n−1(νn−1) . Use the definition of λνn = ηνn φνn(νn−1) φν∗n (νn−1) with ηνn = σn−1σn $νn(νn−1) $νn−1(νn) and knowing that φν∗n−1(νn−1) > 0 we obtain after simplification and leaving out all the factors with phase zero θνn = arg ( σn−1σn$ν n(νn−1)φν∗n (νn−1) ) as claimed. � Note that this expression for ηνn1 is well defined because φν∗n (νn−1) 6= 0 if ν ∈ {α, β}. For ν = γ, the expression is a bit more involved but it can be obtained in a similar way from the normalization conditions given in Theorem 3.9. We skip the details. Another solution of the recurrence relation is formed by the functions of the second kind. Like in the classical case (i.e., for ν = α) we can introduce them for ν ∈ {α, β, γ} by (see [9, p. 83]) ψνn(z) = ∫ T [E(t, z)φνn(t)−D(t, z)φνn(z)]dµ(t). where D(t, z) = t+z t−z and E(t, z) = D(t, z) + 1 = 2t t−z . This results in ψν0 = 1 and ψνn(z) = ∫ T D(t, z)[φνn(t)− φνn(z)]dµ(t), n ≥ 1, which may be generalized to (see [9, Lemma 4.2.2]) ψνn(z)f(z) = ∫ T D(t, z)[φνn(t)f(t)− φνn(z)f(z)]dµ(t), n ≥ 1 Orthogonal Rational Functions on the Unit Circle 19 with f arbitrary in Lν(n−1)∗. It also holds that (see [9, Lemma 4.2.3]) ψν∗n (z)g(z) = ∫ T D(t, z)[φν∗n (t)g(t)− φν∗n (z)g(z)]dµ(t), n ≥ 1 with g arbitrary in Lνn∗(νn∗). Recall that Lνn∗(νn∗) is the space of all functions in Lνn∗ that vanish for z = νn∗ = 1/νn. This space is spanned by {Bν k/B ν n : k = 0, . . . , n − 1} if ν ∈ {α, β}. For ν = γ, the space can be characterized as (see Lemma 3.1) Ln∗(γn∗) = span { Bk Ḃα n Ḃ β n }n−1 k=0 = span { Bα k ζnḂα n−1 }n−1 k=0 = span { Bβ k ζnḂ β n−1 }n−1 k=0 . Theorem 5.6. The following relations for the functions of the second kind hold for n ≥ 0: ψn = (ψαn)Ḃβ n = (ψβn)∗Ḃα n and ψ∗n = (ψαn)∗Ḃβ n = (ψβn)Ḃα n if γn = αn, while ψn = (ψβn)Ḃα n = (ψαn)∗Ḃβ n and ψ∗n = (ψβn)∗Ḃα n = (ψαn)Ḃβ n if γn = βn. We assume the normalization of Theorem 3.9. Proof. This is trivial for n = 0, hence suppose n ≥ 1 and γn = αn then ψn(z) = ∫ T D(t, z)[φn(t)− φn(z)]dµ(t) = ∫ T D(t, z)[φαn(t)Ḃβ n(t)− φαn(z)Ḃβ n(z)]dµ(t) = ψαn(z)Ḃβ n(z), because Ḃβ n(z) = Ḃβ n−1(z) = ∏ j∈bn−1 ζβj (z) = ∏ j∈bn−1 ζαj∗(z) ∈ Lα(n−1)∗. Moreover, using φαn∗ = φβn we also have ψn(z) = ∫ T D(t, z)[φn(t)− φn(z)]dµ(t) = ∫ T D(t, z)[φαn(t)Ḃβ n(t)− φαn(z)Ḃβ n(z)]dµ(t) = ∫ T D(t, z)[φβn∗(t)Ḃ β n(t)− φβn∗(z)Ḃβ n(z)]dµ(t) = ∫ T D(t, z)[φβ∗n (t)Ḃα n (t)− φβ∗n (z)Ḃα n (z)]dµ(t), and since Ḃα n ∈ L β n∗(βn∗), we also get the second part: ψn = (ψβn)∗Ḃα n . Moreover ψ∗n = [ψαnḂ β n ]∗Ḃ α n Ḃ β n = ψαn∗Ḃ α n Ḃ β nḂ β n∗ = [ψαn∗Ḃ α n/Ḃ β n ]Ḃβ n = ψα∗n Ḃβ n . It follows in a similar way that ψ∗n = ψβnḂα n . The case γn = βn is proved similarly. � With these relations, it is not difficult to mimic the arguments of [9, Theorem 4.2.4] and obtain the following. Theorem 5.7. These functions satisfy the recurrence relation[ ψνn(z) −ψν∗n (z) ] = Hν n $ν n−1(z) $ν n(z) [ 1 λνn λ ν n 1 ] [ ζνn−1(z) 0 0 1 ] [ ψνn−1(z) −ψν∗n−1(z) ] , n ≥ 1 with ψν0 = ψν∗0 = 1 and all other quantities as in Theorem 5.1. 20 A. Bultheel, R. Cruz-Barroso and A. Lasarow 6 Para-orthogonal rational functions Before we move on to quadrature formulas, we define para-orthogonal functions (PORF) by Qνn(z, τνn ) = φνn(z) + τνnφ ν∗ n (z), τνn ∈ T, ν ∈ {α, β, γ}. (6.1) The PORF Qνn(z, τνn ) is in Lνn and it is called para-orthogonal because it is not orthogonal to Lνn−1 but it is orthogonal to a particular subspace of Lνn−1 of dimension n− 1. Theorem 6.1. The para-orthogonal rational function Qn(z) = Qn(z, τn), τn ∈ T with n ≥ 2 is orthogonal to ζ̇αnLn−1 ∩ ζ̇βnLn−1 = ζnLn−1 ∩ Ln−1 = Ln−1(γn) with Ln−1(γn) = {f ∈ Ln−1 : f(γn) = 0} = { $∗n(z)pn−2(z) πn−1(z) : pn−2 ∈ Pn−2 } . Recall that $∗n(z) = z − γn if γn 6=∞, and $∗n(z) = −1 if γn =∞. Proof. Suppose γn = αn then ζ̇αn = ζαn and ζ̇βn = 1. Hence φn ⊥ Ln−1 and φ∗n ⊥ ζαnLn−1 and therefore Qn ⊥ Ln−1 ∩ ζαnLn−1 = Ln−1(αn). The proof for γn = βn is similar. � One may also define associated functions of the second kind as P νn (z, τn) = ψνn(z)− τνnψν∗n (z), ν ∈ {α, β, γ}. From Theorems 3.9 and 5.6 we can easily obtain the following corollary. Corollary 6.2. With the notation Qn and Pn for the PORF and the associated functions just introduced, we have for n ≥ 1 Qn(z, τn) = Ḃβ n(z)Qαn(z, τn) = τnḂ α n (z)Qβn(z, τn) if γn = αn, while Qn(z, τn) = Ḃα n (z)Qβn(z, τn) = τnḂ β n(z)Qαn(z, τn) if γn = βn. Similarly Pn(z, τn) = Ḃβ n(z)Pαn (z, τn) = −τnḂα n (z)P βn (z, τn) if γn = αn, while Pn(z, τn) = Ḃα n (z)P βn (z, τn) = −τnḂβ n(z)Pαn (z, τn) if γn = βn. Proof. Assume that γn = αn, then φn = φαnḂ β n and φ∗n = (φαn)∗Ḃβ n . Thus Qn(·, τn) = Ḃβ n [φαn + τn(φαn)∗] = Ḃβ nQ α n(·, τn). In a similar way one has Qn(·, τn) = φβ∗n Ḃ α n + τnφ β nḂ α n = τnḂ α n [ φβn + τnφ β∗ n ] = τnḂ α nQ β n(·, τn). The proofs for Pn and for γn = βn are similar. � We are now ready to state that the zeros of the para-orthogonal rational functions Qνn(z, τνn ) will be simple and on T no matter whether ν = α, β or γ. Orthogonal Rational Functions on the Unit Circle 21 Corollary 6.3. Take n ≥ 1, τn ∈ T and ν ∈ {α, β, γ} then the zeros of Qνn(z, τn) are all simple and on T. Proof. The case ν = α was proved in [9, Theorem 5.2.1] and by symmetry, this also holds for ν = β. For ν = γ, assume that γn = αn, then by the previous corollary Qn(z, τn) = Ḃβ n(z)Qαn(z, τn) so that the zeros of Qn(z, τn) are the zeros of Qαn(z, τn) (and hence will not cancel against any of the poles). The proof for γn = βn is similar. � These PORF properties are important because the zeros will deliver the nodes for the rational Szegő quadrature as we will explain in the next section. For further reference we give the following property. Proposition 6.4. For n ≥ 1 and ν ∈ {α, β, γ}, the PORF satisfy, using the notation of Theorem 5.1 Qνn(z, τn) = cνn $ν n−1(z) $ν n(z) [ ζνn−1(z)φνn−1(z) + τ̃νnφ ν∗ n−1(z) ] with cνn = eνn ( ηn1 + ηn2τnλ ν n ) , τ̃νn = τ̂n + λνn 1 + τ̂nλ ν n ∈ T, τ̂n = ηn1ηn2τn. Proof. Just take the recurrence relation of Theorem 5.1 and premultiply it with [1 τn]. After re-arrangement of the terms, the expression above will result. � The importance of this property is that, up to a constant nonzero factor cνn, we can com- pute Qνn(z, τn) by exactly the same recurrence that gives φνk in terms of φνk−1 and φν∗k−1 for k = 1, 2, . . . , n, except that we have to replace the trailing parameter λνn by a unimodular con- stant τ̃νn to get Qνn(z, τn). When we are interested in the zeros of Qνn(z, τn), then this normalizing factor cνn does not matter since it is nonzero. 7 Quadrature We start by proving that the subspace of rational functions in which the quadrature formulas will be exact only depends on the points {αk : k = 0, . . . , n − 1} no matter whether the points αk are introduced as a pole αk or αk∗ = 1/αk in the sequence γ n = (γk) n k=0. Lemma 7.1. Rn ..= Ln · Ln∗ = Rαn ..= Lαn · Lαn∗ = Rβn ..= Lβn · Lβn∗. Proof. The space Ln contains rational functions of degree at most n whose denominators have zeros in {βj : j ∈ an} ∪ {αj : j ∈ bn}. The space Ln∗ contains rational functions of degree at most n whose denominators have zeros in {βj : j ∈ bn} ∪ {αj : j ∈ an}. Thus the space Rn contains rational functions of degree at most 2n whose denominators have zeros in {βj : j = 1, . . . , n} ∪ {αj : j = 1, . . . , n}. Since the denominators of functions in Lαn have zeros in {βj : j = 1, . . . , n} and functions in Lαn∗ have zeros in {αj : j = 1, . . . , n}, the denominator of functions in Rαn have zeros in {βj : j = 1, . . . , n} ∪ {αj : j = 1, . . . , n} so that Rn = Rαn. Of course a similar argument shows that also Rn = Rβn. � As is well known from the classical case associated with the sequence α discussed for example in [9, Chapter 5], the rational Szegő quadrature formulas are of the form Iαn (f) = n−1∑ j=0 wαnjf(ξαnj), n ≥ 1, 22 A. Bultheel, R. Cruz-Barroso and A. Lasarow where ξαnj = ξαnj(τ α n ), j = 0, . . . , n − 1 are the zeros of the para-orthogonal rational function Qαn(z, ταn ) with ταn ∈ T and the weights are given by wαnj = wαnj(τn) = 1 n−1∑ k=0 |φαk (ξαnj(τ α n ))|2 > 0. These formulas have maximal degree of exactness, meaning that∫ T f(t)dµ(t) = Iαn (f), ∀ f ∈ Rαn−1 and that no n-point interpolatory quadrature formula with nodes on T and positive weights can be exact in a larger (2n− 1)-dimensional space of the form Lαn · Lα(n−1)∗ or Lαn∗ · Lαn−1. They are however exact in a subspace of dimension 2n − 1 of a different form as shown in [39] and [17] for Laurent polynomials and in [11] for the rational case. Our previous results show that taking an arbitrary order of selecting the γ does not add new quadrature formulas. Indeed, if γn = αn then the zeros of Qn(z, τn), Qαn(z, τn) and Qβn(z, τn) are all the same, i.e., ξnj(τn) = ξαnj(τn) = ξβnj(τn), j = 1, . . . , n. Dropping the dependency on τ from the notation, it is directly seen that n−1∑ k=0 |φk(ξnj)|2 = n−1∑ k=0 |φαk (ξnj)Ḃ β k (ξnj)|2 = n−1∑ k=0 |φαk (ξnj)|2 = n−1∑ k=0 |φαk (ξαnj)|2 and thus we also have wnj = wαnj and similarly wnj = wβnj . Therefore In = Iαn = Iβn for appropriate choices of the defining parameters, i.e., ταn = τn and τβn = τn if γn = αn, τβn = τn and ταn = τn if γn = βn. An alternative expression for the weights is also known ([9, Theorem 5.4.1]) and it will obviously also coincide for ν ∈ {α, β, γ}: wνnj = 1 2ξνnj P νn (z) d dzQ ν n(z) ∣∣∣∣∣ z=ξνnj , where Qνn is the PORF with zeros {ξνnj} n−1 j=0 and P νn the associated functions of the second kind as in Corollary 6.2. We have dropped again the obvious dependence on the parameter τνn from the notation. A conclusion to be drawn from this section is that whether we choose the sequence γ, α or β, the resulting quadrature formula we obtain is an n-point formula that is exact in the space Rn−1 = Ln−1 ·L(n−1)∗. Such a quadrature formula is called an n-point rational Szegő quadrature and these are the only positive interpolating quadrature formulas exact in this space. They are unique up to the choice of the parameter τn ∈ T. Thus, to obtain the nodes and weights for a general sequence γ and a choice for τn ∈ T, we can as well compute them for the sequence α and an appropriate ταn ∈ T or for the sequence that alternates between one α and one β. The resulting quadrature will be the same. In practice nodes and weights of a quadrature formula are computed by solving an eigenvalue problem. It will turn out that also in these computations, nothing is gained from adopting a general order for the γ sequence but that a sequence alternating between α and β (the balanced situation) is the most economical one. We shall come back to this in Section 10 after we provide Orthogonal Rational Functions on the Unit Circle 23 in the next sections the structure of the matrices whose eigenvalues will give the nodes of the quadrature formula. Rational interpolatory and Szegő quadrature formulas with arbitrary order of the poles in- cluding error estimates and numerical experiments were considered in [19]. 8 Block diagonal with Hessenberg blocks The orthogonal polynomials are a special case of ORF obtained by choosing a sequence γ = α that contains only zeros. Another special case is given by the orthogonal Laurent polynomi- als (OLP) obtained by choosing an alternating sequence γ = (0,∞, 0,∞, 0,∞, . . .). In [47], Velázquez described spectral methods for ORF on the unit circle that were based on these two special choices. The result was that the shift operator Tµ : L2 µ → L2 µ : f(z) 7→ zf(z) has a matrix representation that is a matrix Möbius transform of a structured matrix. The structure is a Hessenberg structure in the case of γ = α and it is a five-diagonal matrix (a so-called CMV matrix) for the sequence γ = (0, β1, α2, β3, α4, . . .). In the case of orthogonal (Laurent) polyno- mials the Möbius transform turns out to be just the identity and we get the plain Hessenberg matrix for the polynomials and the plain CMV matrix for the Laurent polynomials. In [16], the OLP case was discussed using an alternative linear algebra approach when the 0,∞ choice did not alternate nicely, but the order in which they were added was arbitrary. Then the structured matrix generalized to a so-called snake-shape matrix referring to the zig-zag shape of the graphical representation that fixed the order in which the elementary factors of the matrix had to be multiplied. This structure is a generalization of both the Hessenberg and the CMV structures mentioned above. It is a block diagonal where the blocks alternate between upper and lower Hessenberg structure. To illustrate this for our ORF, we start in this section by using the approach of Velázquez in [47], skipping all the details just to obtain the block structure of the matrix that will represent the shift operator. In the next sections we shall use the linear algebra approach of [16] to analyze the computational aspects of obtaining the quadrature formulas. We start from the recurrence for the φαk and transform it into a recurrence relation for the φk, which will eventually result in a representation of the shift operator. To avoid a tedious book-keeping of normalizing constants, we will just exploit the fact that there are some constants such that certain dependencies hold. For example the recurrence φαn(z) = enηn1 $α n−1(z) $α n(z) [ ζαn−1(z)φαn−1(z) + λαnφ α∗ n−1(z) ] or equivalently φαn(z) = enηn1 [ σn−1 $α∗ n−1(z) $α n(z) φαn−1(z) + λαn $α n−1(z) $α n(z) φα∗n−1(z) ] or $α∗ n−1(z)φαn−1(z) = e−1 n ηn1σn−1$ α n(z)φαn(z)− λαnσn−1$ α n−1(z)φα∗n−1(z) (8.1) will be expressed as $α∗ n−1φ α n−1 ∈ span { $α nφ α n, $ α n−1φ α∗ n−1 } . (8.2) Similarly, by combining the recurrence for φαn and φα∗n from Theorem 5.1 and eliminating φαn−1, we have $α nφ α∗ n ∈ span { $α nφ α n, $ α n−1φ α∗ n−1 } . (8.3) Note that this relation always holds whether or not λαn 6= 0. 24 A. Bultheel, R. Cruz-Barroso and A. Lasarow Now suppose that our sequence γ has the following configuration γ = (. . . , βk−1, αk, . . . , αn, . . . , αm−1, βm, . . . , βl−1, αl, . . . ) with 2 ≤ k ≤ n < m < l where k, m, l are fixed and n varying from k to m− 1. Using (8.2) and then repeatedly (8.3) we get $α∗ n φαn ∈ span { $α n+1φ α n+1, . . . , $ α kφ α k , $ α k−1φ α∗ k−1 } . Multiply this with Ḃβ n = Ḃβ n−1 = · · · = Ḃβ k−1 $α∗ n Ḃβ nφ α n ∈ span { $α n+1Ḃ β nφ α n+1, . . . , $ α k Ḃ β kφ α k , $ α k−1Ḃ β k−1φ α∗ k−1 } , and since by Theorem 3.9, φp = Ḃβ pφαp for p = n, n−1, . . . , k and φk−1 = Ḃβ k−1φ α∗ k−1, this becomes $α∗ n φn ∈ span { $α n+1Ḃ β nφ α n+1, $ α nφn, . . . , $ α kφk, $ α k−1φk−1 } . (8.4) Thus if n+ 1 < m, then γn+1 = αn+1, hence Ḃβ n+1 = Ḃβ n and φn+1 = Ḃβ n+1φ α n+1, so that $α∗ n φn ∈ span{$α p φp}n+1 p=k−1, k ≤ n < m− 1. (8.5) If n+ 1 = m, then γn+1 = βm and we start dealing with the subsequence of β’s in γ = (. . . , αm−1, βm, . . . , βl−1, αl, . . .). Therefore we note that $α m(z) = (z − αm) 1− αmz z − αm = (z − αm) z − βm 1− βmz σ2 m = σm$ α∗ m (z)ζβm(z). (8.6) Thus $α mḂ β m−1φ α m = σm$ α∗ m Ḃβ mφ α m. Using (8.2) again repeatedly we then see that $α mḂ β m−1φ α m ∈ Ḃβ m span { $α mφ α∗ m , $ α m+1φ α m+1 } = Ḃβ m span { $α mφ α∗ m , $ α∗ m+1ζ β m+1φ α m+1 } = span { $α mḂ β mφ α∗ m , $ α∗ m+1Ḃ β m+1φ α m+1 } = · · · = span { $α mḂ β mφ α∗ m , . . . , $ α l−1Ḃ β l−1φ α∗ l−1, $ α l Ḃ β l φ α l } = span { $α mφm, . . . , $ α l−1φl−1, $ α l φl } , so that by plugging this into (8.4) with n+ 1 = m gives $α∗ m−1φm−1 ∈ span { $α p φp }l p=k−1 . (8.7) The two relations (8.4) and (8.7) tell us how we should express $α∗ n φn in terms of the {$α kφk : k ≤ n+ 1} in the case that γn = αn. The next step is to express $α∗ n φn in terms of {$α p φp}, p ≥ n− 1 when γn = βn. This goes along the same lines. Let us consider the mirror situation γ = (. . . , αm−1, βm, . . . , βn, . . . , βl−1, αl, . . . , αr−1, βr, . . .) Orthogonal Rational Functions on the Unit Circle 25 with 2 ≤ m ≤ n < l < r where m, l, r are again fixed and n is varying from m to l − 1. We first use (8.6) for m = n to write $α∗ n φn = $α∗ n Ḃβ nφ α∗ n = σn$ α nḂ β n−1φ α∗ n , which implies that we shall need expressions for $α nφ α∗ n . We use repeatedly (8.3) in combination with the previous relation to get $α∗ n φn ∈ Ḃβ n−1 span { $α n−1φ α∗ n−1, $ α nφ α n } = span { Ḃβ n−1$ α n−1φ α∗ n−1, Ḃ β n$ α∗ n φαn } = span { Ḃβ n−1$ α n−1φ α∗ n−1, Ḃ β n$ α nφ α∗ n , Ḃ β n$ α n+1φ α n+1 } = span { Ḃβ n−1$ α n−1φ α∗ n−1, Ḃ β n$ α nφ α∗ n , Ḃ β n+1$ α∗ n+1φ α n+1 } = span { Ḃβ n−1$ α n−1φ α∗ n−1, Ḃ β n$ α nφ α∗ n , Ḃ β n+1$ α n+1φ α∗ n+1, Ḃ β n+1$ α n+2φ α n+2 } = · · · = span { Ḃβ n−1$ α n−1φ α∗ n−1, . . . , Ḃ β m−1$ α l−1φ α∗ l−1, Ḃ β m−1$ α l φ α l } = span { Ḃβ n−1$ α n−1φ α∗ n−1, $ α nφn, . . . , $ α l−1φl−1, $ α l φl } , where in the last step we used γl = αl and hence Ḃβ l−1 = Ḃβ l and Ḃβ l−1φ α l = φl. Thus if n > m, also γn−1 = βn−1 so that also the first term is $α n−1φn−1 and we have found that $α∗ n φn ∈ span { $α p φ α p }l p=n−1 , m < n ≤ l − 1. (8.8) Now we are left with the remaining case n = m, i.e., γn−1 = αn−1 while γn = βn. This is the missing link that should connect a β block to the previous α block at the boundary . . . , αm−1, βm, . . .. Using (8.3) repeatedly we get $α m−1φ α∗ m−1 ∈ span { $α m−1φ α m−1, $ α m−2φ α∗ m−2 } = span { $α m−1φ α m−1, $ α m−2φ α m−2, $ α m−2φ α∗ m−2 } = · · · = span { $α m−1φ α m−1, $ α m−2φ α m−2, . . . , $ α kφ α k , $ α k−1φ α∗ k−1 } . After multiplying with Ḃβ k = Ḃβ k+1 = · · · = Ḃβ n−1 we get $α m−1φ α∗ m−1 ∈ span { $α p φp }n−1 p=k−1 . We plug this in our previous expression for $α∗ n φn and we arrive at $α∗ m φm ∈ span { $α p φp }l p=k−1 . (8.9) To summarize, (8.4) and (8.7) in the case γn = αn and (8.8) and (8.9) in the case γn = βn show short recurrences for the φp that fully rely on the recursion for the α-related quanti- ties: φαn and φα∗n . They use only factors $α p and $α∗ p and in the recurrence relations only the rational Szegő parameters λαp for the α sequence are used. The longest recursions are needed to switch from a α to a β block, which is for n ∈ {m− 1,m} as given in (8.7) and (8.9) since these need all elements form both blocks. This analysis should illustrate that as long as we are in a succession of α’s we are building up an upper Hessenberg block. At the instant the α sequence switches to a β sequence one starts building a lower Hessenberg block, which switches back to upper Hessenberg when again α’s 26 A. Bultheel, R. Cruz-Barroso and A. Lasarow ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ���������� ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� αβαβα α β β α α = αβαβα α β β α α Figure 1. Structure of the matrix Ĝ which is the same as the structure of the matrix GA. enter the γ sequence etc. See Fig. 1. Of course if there are only α’s in the sequence, we end up with just an upper Hessenberg as in the classical case. If we alternate between one α and one β’s we get the so called CMV type matrix which is a five-diagonal matrix with a characteristic block structure a given in Fig. 2. Suppose we start with a set of α’s, and set A = diag(α0, α1, . . .) and let us recycle our earlier notations $n(z) = 1 − αnz and $∗n(z) = z − αn to now act on an arbitrary operator T by doing the transform for the whole sequence α simultaneously as follows: $A(T ) = I − T A∗ and $∗A(T ) = T − A where I is the identity operator. Then (at least formally) we have [φ0, φ1, . . .]$ ∗ A(zI) = [φ0, φ1, . . .]$A(zI)Ĝ (8.10) with Ĝ having the structure shown in Fig. 1. In the special case that the α’s and β’s alternate as in γ = (α1, β2, α3, β4, α5, . . .), we get the five-diagonal matrix as in [12] for polynomials and for ORF in [47]. Since for k ≥ 1 each α2k−1-column is the last in an α block and each β2k-column is the first in a β block, we get a succession of 4× 2 blocks that shift down by two rows as illustrated by Fig. 2. The particular role of the last column in an α block and the first column in a β block is due to the fact that we have chosen to derive these relations for the φk from the recurrence for the φαk leading to factors $A and $∗A and the use of parameters λαk . A symmetric derivation could have been given by starting from the φβk recursion, in which case the β blocks will correspond to upper Hessenberg blocks and the α blocks to lower Hessenberg blocks. Then the longer recurrence would occur in the last column of the β block and the first column of the α block. The shape of the matrix would be the transposed shape of the figures shown here. However while it is rather simple to deal with αj = 0 in the current derivation, it requires dealing with more technical issues to write down the formulas with the corresponding βj equal to ∞. Therefore we stick to the present form. Orthogonal Rational Functions on the Unit Circle 27 ������������������������������������������������������������������������������������������ ���� ���� ���� ���� ���� ���� ���� ���� ���� ���� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ���� ���� ���� ���� ���� ���� ���� ���� ���� ���� ���� ���� ���� ���� ���� ���� ���� ���� ���� ���� ���� ���� ���� ���� ���� ���� ���� ���� ���� ���� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � β α α β α β α β α β α α β α β α β α β α β ������������������������������������������������������������������������������������������ ������������������������������������������������������������������������������������������ ������������������������������������������������������������������������������������������ ������������������������������������������������������������������������������������������ ������������������������������������������������������������������������������������������ ������������������������������������������������������������������������������������������ ������������������������������������������������������������������������������������������ ���������������������������������������������� ������������������������������������������������������������������������������������������ ������������������������������������������������������������������������������������������ ������������������������������������������������������������������������������������������ ������������������������������������������������������������������������������������������ Figure 2. Structure of the matrix Ĝm when α’s and β’s alternate, which is the five-diagonal CMV matrix given in [12, 47]. We can define as in [47] an operator Möbius transform of an operator T by ζA(T ) = ηA$A(T )−1$∗A(T )η−1 A∗ , $A(T ) = I − T A∗, $∗A(T ) = T − A, and its inverse ζ̃A(T ) = η−1 A $̃∗A(T )$̃A(T )−1ηA∗ , $̃A(T ) = I +A∗T , $̃∗A(T ) = T +A, where ηA := $A(A)1/2 = √ I −AA∗ = √ I −A∗A = ηA∗ = $A∗(A∗)1/2 = diag ( 1, √ 1− |α1|2, √ 1− |α2|2, . . . ) . Recall the meaning of the ( )∗ notation for a (constant) matrix from Remark 3.5. Note that $̃A is like $A, except that the minus sign is changed into a plus sign and T A∗ is changed into A∗T . It should also be clear that the alternating case (i.e., the case of the CMV representation) gives the smallest possible bandwidth, as was also reported in [12]. Now it is not difficult to see that (8.10) is equivalent to [φ0(z), φ1(z), φ2(z), . . .] ( zI − ζ̃A(G) ) = 0, Ĝ = η−1 A G ηA∗ . To see this, exactly the same argument as used in [47] to derive relation (19) of that paper can be applied here. Instead we give a direct derivation. The relation (8.10) can be written as 0 = Φ [ $∗A(zI)−$A(zI)Ĝ ] = Φ [ (zI − A)− (I − zA∗)Ĝ ] = Φ [ z ( I +A∗Ĝ ) − ( A+ Ĝ )] . Since $̃A(Ĝ) = (I +A∗Ĝ) is invertible, this is equivalent with 0 = Φ [ zI − $̃∗A ( Ĝ ) $̃A ( Ĝ )−1] = Φ [ zI − η−1 A $̃∗A(G)$̃A(G)−1ηA∗ ] = Φ [ zI − ζ̃A(G) ] , where we used the fact that the matrix ηA = ηA∗ and its inverse commute with any diagonal matrix. The matrices G and Ĝ have the same shape, but G is rescaled to make it isometric. So we see that on L = span{φ0, φ1, . . .} the operator zI has a matrix representation ζ̃A(G) with respect to the φ-basis. We shall have a more careful discussion of this fact in the next section. 28 A. Bultheel, R. Cruz-Barroso and A. Lasarow 9 Factorization of a general CMV matrix It is well known (see [6, 47] in the rational case and [16] for the polynomial case) that the Hessenberg matrix G = H (obtained when γ = α) can be written as an infinite product of elementary matrices, i.e., G = H = G1G2G3G4 · · · , where Gk := Ik−1 0 0 0 G̃k 0 0 0 I∞  , G̃k := [ −δk ηk ηk δk ] , ∀ k ≥ 1, (9.1) where Ik−1 and I∞ are the identity matrices of sizes (k− 1) and ∞ respectively, I0 is the empty matrix, δk = λαk is the k-th Szegő parameter and ηk := √ 1− |δk|2. Also the CMV matrix Cε associated with the alternating sequence γ = ε where γ2k−1 = α2k−1 and γ2k = β2k, for all k ≥ 1 uses the same elementary transforms, but they are multiplied in a different order: G = CoCe = (· · ·G9G7G5G3G1) · (G2G4G6G8G10 · · · ) = (G1G3G5G7G9 · · · ) · (G2G4G6G8G10 · · · ) . To find explicit expressions for these elementary factors in our case, taking into account the proper normalization, requires a more detailed analysis. We start with the following Theorem 9.1. In the case γ = α, the recurrence relation of Theorem 5.1 can be rewritten in this form[ $α∗ n−1φ α n−1√ 1− |αn−1|2 $α nφ α∗ n√ 1− |αn|2 ] = [ $α n−1φ α∗ n−1√ 1− |αn−1|2 $α nφ α n√ 1− |αn|2 ] G̃αn (9.2) with G̃αn = σn−1η α n1 [ −λαnηαn1 √ 1− |λαn|2√ 1− |λαn|2 λαnη α n1 ] [ 1 0 0 σn ] or [ $α∗ n−1φ α n−1 $α nφ α∗ n ] = [ $α n−1φ α∗ n−1 $α nφ α n ] Ĝαn (9.3) with Ĝαn = [( 1− |αn−1|2 )−1/2 0 0 ( 1− |αn|2 )−1/2 ] G̃αn [( 1− |αn−1|2 )1/2 0 0 ( 1− |αn|2 )1/2 ] . Proof. This is in fact an explicit form of what in the previous section was expressed as (8.2) and (8.3). Taking the first line of the recurrence relation from Theorem 5.1 and solving for $α∗ n−1φ α n−1 we get the explicit from of (8.1), i.e., $α∗ n−1φ α n−1 = (eαn)−1ηαn1σn−1$ α nφ α n − λαnσn−1$ α n−1φ α∗ n−1. Orthogonal Rational Functions on the Unit Circle 29 Using eαn = √ 1−|αn|2√ 1−|αn−1|2 1√ 1−|λαn |2 and ηαn2 = ηαn1σn−1σn this becomes $α∗ n−1φ α n−1√ 1− |αn−1|2 = ηαn1σn−1 [ −λαnηαn1 $α n−1φ α∗ n−1√ 1− |αn−1|2 + √ 1− |λαn|2 $α nφ α n√ 1− |αn|2 ] . (9.4) The second line of the recurrence relation in Theorem 5.1 is $α nφ α∗ n = eαnη α n2σn−1λαn$ α∗ n−1φ α n−1 + eαnη α n2$ α n−1φ α∗ n−1. Eliminate the term $α∗ n−1φ α n−1 with the first line of the recurrence and after inserting the value of eαn from (5.1) and rearranging, we get $α nφ α n√ 1− |αn|2 = ηαn1σn−1 [ $α n−1φ α n−1√ 1− |αn−1|2 √ 1− |λαn|2σn + $α nφ α n√ 1− |αn|2 λαnη α n1σn ] . This is equivalent with the formula (9.2). Derivation of the other form (9.3) is obvious. � Note that G̃αn is unitary but Ĝαn is not unless |λαn| = 1 or αn = αn−1. Now suppose that n ∈ an, i.e., γn = αn then Ḃβ n = Ḃβ n−1 and hence[ $α∗ n−1φ α n−1Ḃ β n−1 $α nφ α∗ n Ḃ β n ] = [ $α n−1φ α∗ n−1Ḃ β n−1 $α nφ α nḂ β n ] Ĝαn. (9.5) If n ∈ bn, i.e., γn = βn, then Ḃβ n = Ḃβ n−1ζ β n = σnḂ β n−1 $αn $α∗n . Define then G̃βn = [ σn 0 0 1 ] G̃αn [ σn 0 0 1 ] = S̃∗nG̃ α nS̃n, S̃n = diag(σn, 1), (9.6) and as for Ĝαn Ĝβn = [( 1− |αn−1|2 )−1/2 0 0 ( 1− |αn|2 )−1/2 ] G̃βn [( 1− |αn−1|2 )1/2 0 0 ( 1− |αn|2 )1/2 ] = [ σn 0 0 1 ] Ĝαn [ σn 0 0 1 ] , so that[ $α n−1φ α∗ n−1Ḃ β n−1 $α∗ n φαnḂ β n ] Ĝβn = [ $α n−1φ α∗ n−1Ḃ β n−1 $α nφ α nḂ β n−1 ] σnĜ α n [ σn 0 0 1 ] = [ $α∗ n−1φ α n−1Ḃ β n−1 $α nφ α∗ n Ḃ β n−1 ] [1 0 0 σn ] = [ $α∗ n−1φ α n−1Ḃ β n−1 $α∗ n φα∗n Ḃ β n ] . Thus [ $α∗ n−1φ α n−1Ḃ β n−1 $α∗ n φα∗n Ḃ β n ] = [ $α n−1φ α∗ n−1Ḃ β n−1 $α∗ n φαnḂ β n ] Ĝβn. (9.7) We build the general CMV matrix as a product of G-factors. Set α0 = 0 and suppose we consider for 1 ≤ n < k < m < l < · · · ≤ ∞ α0, α1, . . . , αn−1|βn, . . . , βk−1|αk, . . . , αm−1|βm, . . . , β`−1|α`, . . . . 30 A. Bultheel, R. Cruz-Barroso and A. Lasarow We use vertical bars to separate the different blocks. These separations are of course clear from the notation α and β, but since that will not so obvious when we denote the corresponding functions in the vectors below, we introduced them here and keep them at the corresponding positions in the vectors that we are about to introduce now. That should improve readability of the formulas. We consider the vector Φ = [φ0, φ1, φ2, . . .] so that Φ(I − A∗z) = [ $α 0 φ0, . . . , $ α n−1φn−1 ∣∣$α nφn, . . . , $ α k−1φk−1 ∣∣$α kφk, . . . . . . , $α m−1φm−1 ∣∣$α mφm, . . . , $ α `−1φ`−1 ∣∣$α ` φ`, . . . ] = [ $α 0 φ α 0 , . . . , $ α n−1φ α n−1Ḃ β n−1 ∣∣$α nφ α∗ n Ḃ β n , . . . , $ α k−1φ α∗ k−1Ḃ β k−1 ∣∣$α kφ α k Ḃ β k , . . . . . . , $α m−1φ α m−1Ḃ β m−1 ∣∣$α mφ α∗ m Ḃ β m, . . . , $ α `−1φ α∗ `−1Ḃ β `−1 ∣∣$α ` φ α ` Ḃ β ` , . . . ] . Define for ν ∈ {α, β} (recall ηA = √ I −A∗A) Gνn = In−1 0 0 0 G̃νn 0 0 0 I∞  = diag(In−1, G̃ ν n, I∞), and Ĝνn = η−1 A GνnηA. (9.8) Apply successively the Gνn to the right on the Φ vector. Keep an increasing order for the factors within an α block and a decreasing order for the factors covering a β block. Thus for the example given above the order will be γ = (α0, α1, . . . , αn−1‖βn, . . . , βk−1|αk‖αk+1, . . . , αm−1‖βm, . . . , β`−1|α`‖α`+1, . . .), Ĝ = (Ĝα1 Ĝ α 2 · · · Ĝαn−1)︸ ︷︷ ︸ Ĝ1 α (Ĝαk Ĝ β k−1 · · · Ĝ β n)︸ ︷︷ ︸ Ĝ1 β (Ĝαk+1 · · · Ĝαm−1)︸ ︷︷ ︸ Ĝ2 α (Ĝα` Ĝ β `−1 · · · Ĝ β m)︸ ︷︷ ︸ Ĝ2 β (Ĝα`+1 · · · ) (9.9) = η−1 A (Gα1G α 2 · · ·Gαn−1)︸ ︷︷ ︸ G1 α (GαkG β k−1 · · ·G β n)︸ ︷︷ ︸ G1 β (Gαk+1 · · ·Gαm−1)︸ ︷︷ ︸ G2 α (Gα`G β `−1 · · ·G β m)︸ ︷︷ ︸ G2 β (Gα`+1 · · · ) ︸ ︷︷ ︸ ηA = η−1 A G ηA with Ĝνn and Gνn, ν ∈ {α, β} as in (9.8) and similarly Ĝnν = η−1 A GnνηA, for n = 1, 2, . . .; ν ∈ {α, β} and Ĝ = η−1 A GηA. (9.10) Note that we have aligned these formulas properly so that a shift in the blocks from the first to the second line in (9.9) is made clear. The rationale is that in a Gnβ group the indices are decreasing from left to right and in a Gnα group the indices increase from left to right. This is what defines the shape of the result. We used the vertical bars to separate α and β blocks in the sequence γ as before. Now we use double vertical bars to indicate what corresponds to the Gnα and Gnβ blocks on the third line. On the second line in (9.9) you may note that the first α in an α block of the γ sequence corresponds to a Ĝα factor that migrates to the front of the product group of previous β block. This is due to the fact that we analysed the γ sequence using the parameters λαk of the α recurrence (recall the definitions of the Ĝαk and Ĝβk from Theorem 9.1 and equation (9.6)). To see the effect of applying all these elementary operations, we start with the first block G1 α. Using (9.5) when multiplying Φ(I − A∗z) with Ĝ1 α gives (note φ0 = φ∗0)[ $α∗ 0 φ0, . . . , $ α∗ n−2φn−2, $ α n−1φ α∗ n−1Ḃ β n−1 ∣∣$α nφn, . . . ] . Orthogonal Rational Functions on the Unit Circle 31 While multiplying this result with Ĝαk Ĝ β k−1 · · · Ĝ β n+1, we make use of (9.5) and (9.7) to obtain[ $α∗ 0 φ0, . . . , $ α∗ n−2φn−2, $ α n−1φ α∗ n−1Ḃ β n−1 ∣∣ $α∗ n φαnḂn, $ α∗ n+1φn+1, . . . , $ α∗ k−1φk−1 ∣∣$α kφ ∗ k, $ α k+1φk+1, . . . ] and the remaining multiplication Ĝβn of Ĝ1 β links the current β block to the previous α block, resulting in[ $α∗ 0 φ0, . . . , $ α∗ n−1φn−1 ∣∣$α∗ n φn, . . . , $ α∗ k−1φk−1 ∣∣$α kφ ∗ k, $ α k+1φk+1, . . . ] . The next block is again an α block treated by the product Ĝ2 α, and one may continue like this to finally get Φ(I − A∗z)Ĝ = Φ(zI − A) or Φ(I − A∗z)G = Φ(zI − A), Ĝ = η−1 A ĜηA, (9.11) since ηA is invertible. This Ĝ can be factored as the product of elementary matrices hav- ing only one non-trivial 2 × 2 diagonal block: Ĝ = Ĝ1Ĝ2 · · · . By setting Ĝ = η−1 A GηA = (η−1 A G1ηA)(η−1 A G2ηA) · · · = G1G2 · · · , we made G as well as these elementary factors Gk uni- tary. These elementary factors are grouped together in blocks as a consequence of the α and β blocks in the γ sequence. For example G2 β = Gα`G β `−1 · · ·G β m corresponds to a β block. Because ` > m there must be at least two factors in such a β block: the first one is Gαl and the last one is a Gβm. However for an α block we have the product G2 α = Gαk+1 · · ·Gαm−1 thus if k = m − 1 then the initial index k+ 1 is larger than the end index m− 1. Thus if an α block has only one element then there are no factors in this product, which means that this Gα is just the identity. In the case of γ = α (no β’s), then there is of course only one infinite block so n = ∞ and there is no k,m, `, . . . and G = Gα1G α 2G α 3 · · · =: H is the familiar upper Hessenberg matrix, and in the case of alternating α-β sequence, we have in the case α1, β2, α3, β4, . . . G = Gα1 ( Gα3G β 2 ) I∞ ( Gα5G β 4 ) I∞ ( Gα7G β 6 ) · · · = ( Gα1G α 3G α 5 · · · )( Gβ2G β 4G β 6 · · · ) = C = Cαo Cβe , where we grouped the product of the Gαk factors as Cαo and of the Gβk factors as Cβe . This rearrangement of the factors is possible because the blocks commute when their subscript differs by at least two. If we define Se = diag ( 1, S̃2, S̃4, S̃6, . . . ) , S̃2k = diag(σ2k, 1), then of course C = Cαo S∗eCαe Se. In the case of an alternating sequence of the form β1, α2, β3, α4, . . ., then G = ( Gα2G β 1 ) I∞ ( Gα4G β 3 ) I∞ ( Gα6G β 5 ) · · · = ( Gα2G α 4G α 6 · · · )( Gβ1G β 3G β 5 · · · ) = C = Cαe Cβo . As in the previous case this is C = Cαe S∗oCαo So with So = diag ( S̃1, S̃3, S̃5, . . . ) , S̃2k−1 = diag(σ2k−1, 1). These are the classical CMV matrices as also given in [47], except for the S factors, which were not in [47]. That is because we have used a particular normalization for the φn-basis. A slightly different normalization of the φn-basis will remove the S factors. Indeed, replacing all φn by ϕαn = ςβnφn, where ςβn = Ḃβ n |Ḃβ n | = ∏ j∈bn σj , 32 A. Bultheel, R. Cruz-Barroso and A. Lasarow will do. To see this, note that the relation (9.6) between G̃βn and G̃αn can also be written as (multiply with σnσn = 1) G̃βn = [ 1 0 0 σn ] G̃αn [ 1 0 0 σn ] . (9.12) The first and the last of these factors can then be moved to the ϕαn so that the β-relation (9.7) now becomes[ $α n−1ς̇ β n−1φ α n−1Ḃ β n−1 $α∗ n ς̇βnφα∗n Ḃ β n ] = [ $α n−1ς̇ β n−1φ α∗ n−1Ḃ β n−1 $α∗ n ς̇βnφαnḂ β n ] Ĝαn. (9.13) Note that the multiplication on the right is now with Ĝαn and not with Ĝβn anymore. For the α-case, i.e., γn = αn nothing essentially changes since then ς̇βn = ς̇βn−1. It should be clear that following the same arguments used above, the relation (9.11) then becomes Φα(I − A∗z)Ĝα = Φα(zI − A) with Φα = [ϕα0 , ϕ α 1 , ϕ α 2 , . . .] and Ĝα is exactly like Ĝ in (9.10), except that all Gβj should be replaced by a Gαj , that is, replace all the Gβk factors in G by the corresponding Gαk by removing the Sk factors in (9.6) and G becomes Gα. From (9.11) we derive that zI = ( Ĝ +A )( I +A∗Ĝ )−1 = η−1 A (G +A) ( I +A∗G )−1 ηA = ζ̃A(G), which is the matrix representation of the shift operator in L with respect to the basis (φ0, φ1, φ2, . . .). With respect to the basis Φα, the expression is the same except that G should be replaced by Gα. We summarize our previous results. Theorem 9.2. In the general case of a sequence γ, then multiplying the vector of basis functions Φ(z) = [φ0, φ1(z), φ2(z), . . .] by z corresponds to multiplying it from the right with the infinite matrix( Ĝ +A )( I +A∗Ĝ )−1 = η−1 A (G +A)(I +A∗G)−1ηA, where A = diag(α0, α1, . . .), ηA = √ I −A∗A and G is a product of unitary factors Gk defined in (9.8) where the order of multiplication is from left to right in a block of successive α-values and from right to left in a block of successive β-values as explained in detail above. For the sequence α we get the classical Hessenberg matrix G = Gα1G α 2G α 3 · · · = H. The classical CMV matrices are obtained for α1, β2, α3, β4, . . . as G = Gα1 ( Gα3G β 2 )( Gα5G β 4 )( Gα7G β 6 ) · · · = ( Gα1G α 3G α 5 · · · )( Gβ2G β 4G β 6 · · · ) = C = Cαo Cβe and in the case β1, α2, β3, α4, . . ., then G = ( Gα2G β 1 )( Gα4G β 3 )( Gα6G β 5 ) · · · = ( Gα2G α 4G α 6 · · · )( Gβ1G β 3G β 5 · · · ) = C = Cαe Cβo . It we use the slightly different orthonormalized basis (ϕαn)n∈N, then the previous relations still hold true, except that all Gβj can be replaced by Gαj . Orthogonal Rational Functions on the Unit Circle 33 Note that λαn = 0 does not give any problem for these formulas of the G-factors. If γ = α then all φαn are regular, but for a general sequence γ, it is possible that φn is not regular. Recall that then λn =∞ but by Theorem 5.3 this means that λαn = 0 and thus there is no problem in using the G-matrices introduced above, even if the ORF sequence is not regular. Another thing to note here is that, as we remarked before, the factors Gνk are unitary, although the factors Ĝνk = η−1 A GνkηA are in general not unitary. Moreover ζ̃A(G) = η−1 A (G +A)(I +A∗G)−1ηA is unitary when G is unitary as can be directly verified in the finite-dimensional case. For example [ζ̃A(G)][ζ̃A(G)]∗ = I if and only if (I +AG∗)η−2 A (I + GA) = (G∗ +A∗)η−2 A (G +A). Note that ηA is invertible but A is not (since for example α0 = 0) and that A and ηA commute. To see that the relation holds, we multiply out and bring everything to one side, which gives G∗ ( η−2 A A−Aη −2 A ) + ( A∗η−1 A − η −1 A A ∗)+ G∗η−1 A G − η −2 A − G ∗Aη−2 A A ∗G +A∗η−2 A A. So we have to prove that this is zero. Because η−2 A A = Aη−2 A and η−2 A A∗ = A∗η−2 A , the first two terms vanish. The four remaining terms can be multiplied from the left and the right with ηA = η∗A to give ηAG∗η−2 A GηA − I − ηAG ∗Aη−2 A A ∗GηA + ηAA∗η−2 A AηA. Because ηAA∗ = A∗ηA and AηA = AηA, the last term equals A∗A and combined with the second term −I it equals −η2 A. Multiply everything again with η−1 A from the left and from the right gives G∗η−2 A G − G ∗Aη−2 A A ∗G − η−1 A η2 Aη −1 A = G∗η−2 A G − G ∗Aη−2 A A ∗G − I = (G∗η−1 A )(I − ηAAη−2 A A ∗ηA)(η−1 A G)− I = (G∗η−1 A )(I − AηAη−2 A ηAA∗)(η−1 A G)− I = (G∗η−1 A )(I − AA∗)(η−1 A G)− I = G∗η−1 A η2 Aη −1 A G − I = G∗G − I, which is zero because G is unitary. A similar calculation verifies that [ζ̃A(G)]∗[ζ̃A(G)] = I. The same holds for the infinite case, i.e., in the whole of L2 µ if L is dense (see also [47]). When using the recursion for the φn with respect to a general sequence γ, Theorem 5.1 completed by Lemma 5.5 says what the Szegő parameters λn are. Note however that for the basis Φ corresponding to a general sequence γ, the matrix G had factors Gαk and Gβk that were all defined in terms of λαk and not λk (see Theorem 9.1 and equation (9.6)). Theorem 5.3 explains that, depending on γn and γn−1 an λn is either equal to λαn or λαn or the inverses of these. Thus it should be no problem to obtain all the λαn from the λn or vice versa. To illustrate how to use the quantities for a general γ sequence in generating the Gαn matrix we consider the following example. If n − 1 ∈ bn and n ∈ an, then γn = αn, γn−1 = βn−1, λαn = 1/λn, φαn = φn/Ḃ β n , and φα∗n = φ∗n/Ḃ β n . This allows us to express ηαn1 and all the other elements of Gn in terms of the γ-related elements. If we assume that φn is regular, then G̃αn = σn−1η α n1 [ −ηαn1/λn √ 1− 1/|λn|2√ 1− 1/|λn|2 ηαn1/λn ] [ 1 0 0 σn ] 34 A. Bultheel, R. Cruz-Barroso and A. Lasarow = σn−1η α n1 |λn| [ un 0 0 1 ] [ −1 √ |λn|2 − 1√ |λn|2 − 1 1 ] [ 1 0 0 unσn ] , where un = (λn/|λn|)ηαn1 ∈ T. It still requires ηαn1 in terms of γ-elements. This is a bit messy but still possible using φα∗n = φ∗n/Ḃ β n and Lemma 5.5. The factor 1/Ḃβ n has an essential role here because we have to evaluate φα∗n (z) for z = αn−1, and if 1/αn−1 = βn−1 ∈ {γk : k ∈ bn}, which means that αn−1 has been introduced as a pole in a previous step, then this pole in φ∗n will cancel against the same pole in Ḃβ n . Thus we should use a limiting process or expand φα∗n = φ∗n/Ḃ β n or make the cancellation explicit as we did for example in equation (3.1). We skip further details. If n is not a regular index, then λn =∞ and in that case the matrix G̃αn has the form G̃αn = σn−1η α n1 [ 0 1 1 0 ] [ 1 0 0 σn ] = σn−1η α n1 [ 0 σn 1 0 ] . 10 Spectral analysis The full spectral analysis of the matrices ζ̃A(G) and how this relates to the shift operator in L2 µ is given in [47] for the case γ = α (then G = H is a Hessenberg matrix) and in the case of the alternating γ = ε where either ε2k = α2k and ε2k+1 = β2k+1 or ε2k = β2k and ε2k+1 = α2k+1 (then G = C is a CMV matrix). Velázquez shows for the cases ν ∈ {α, ε} that if Tµ is the shift operator on L2 µ, (i.e., multipli- cation with z in L2 µ) and if L is the closure of L∞ in L2 µ, then the matrix representation of Tµ � L with respect to the basis Φα = [ϕα0 , ϕ α 1 , ϕ α 2 , . . .] is given by ζ̃A(Gα) (see [47, Theorems 4.1, 4.2, 5.1, 5.4]). We mention here the basis (ϕαk )k∈N because in [47] only one type of Gαk matrices is used. As we have explained, this is related to the precise normalization used for the basis functions, but as long as the basis is orthogonal and has norm 1, a further normalization with a unimodular constant will not influence the spectral properties of the matrix representations so that similar results hold for the basis Φ and the matrix ζ̃A(G). If moreover L is dense in L2 µ and hence L = L2 µ, then ζ̃A(G) is not only isometric but it is a representation of the full matrix. For α compactly included in D this happens for γ = α when logµ′ 6∈ L1 and in the case γ = ε when ∞∑ k=1 (1 − |α2k|) = ∞ = ∞∑ k=1 (1 − |α2k+1|) (see [47, Theorems 4.1, Proposition 5.3]). In fact, the proof of [47, Proposition 5.3] needs only minor adaptations to be applicable to the present situation. So we can state without further details: Theorem 10.1. If the sequence γ is such that ∑ k∈a∞ (1 − |αk|) = ∞ and ∑ k∈b∞ (1 − |αk|) = ∞, then both (φk)k∈a∞ and (φk)k∈b∞ are bases of L2 µ. Now also [47, Theorem 5.4] can be directly translated to our situation: Theorem 10.2. For the general sequence γ as in the previous theorem, U = ζ̃A(G) is the matrix representation of the shift operator Tµ of L2 µ and σ(U) = suppµ. Moreover, if Φ = [φ0, φ1, . . .] then ξ is a mass point of µ if and only if ‖Φ(ξ)‖`2 <∞, µ({ξ}) = 1/‖Φ(ξ)‖2`2 and Φ(ξ) is a (left) eigenvector for the eigenvalue ξ of U . In [47] it is also explained how the spectrum of ζ̃A(G) = η−1 A $̃∗A(G)$̃A(G)−1ηA relates to the spectrum of the pencil ($∗A(G), $A(G)). In the case of an alternating sequence ε, the matrix G was a CMV matrix which we denoted as C. It was then possible to regroup the even Orthogonal Rational Functions on the Unit Circle 35 and odd elementary factors so that C = Cαo C β e or C = Cαe C β o with Cνo = Gν1G ν 3G ν 5 · · · and Ce = Gν2G ν 4G ν 6 · · · for ν ∈ {α, β}. As a consequence, because all factors are unitary, the previous result could be formulated in terms of the spectral properties for the pencils (ACβ∗e + Cαo , C β∗ e +A∗Cαo ) or (ACβ∗o + Cαe , Cα∗o + A∗Cαe ). In the general case, such a regrouping is possible, but rather complicated because it will depend on the sizes of the α and β blocks in the γ sequence, i.e., on the number of elementary G-factors that appears in the product in increasing or decreasing order (see (9.9)). There are also spectral interpretations for the eigenvalues of the finitely truncated sections of the matrices. Again, a careful discussion is given in [47] for this. If for an operator T we denote its truncation to Ln as Tn = PnT � Ln with Pn the orthogonal projection operator on Ln, then Velázquez shows that the zeros of the φn+1 are the eigenvalues of the finite matrices Vn = ζ̃An(Gn) with Gn the leading principle submatrix of size (n + 1) of the matrix G and An = diag(α0, . . . , αn). As illustrated in [47, equation (19)], Gn is the product of unitary matrices Gk of the form (9.1) for k = 1, . . . , n + 1 except that Gn+1 is ‘truncated’ and is of the form diag(In ⊕ −δn+1) where δn+1 (up to a unimodular constant) is λn+1 and thus this truncated factor Gn+1 is not unitary because λn+1 6∈ T in general. For γ = α this is proved in [47, Theorem 4.5] and for γ = ε in [47, Theorem 5.8]. The same arguments can be used for a general sequence γ. Indeed (9.5) and (9.7) show that in the truncated version we shall end up with [φ0, . . . , φn]($∗An −$AnGn) = [0, . . . , 0, χn+1], where χn+1 is of the form cn+1φ α n+1Ḃ β n+1 or cn+1φ α∗ n+1Ḃ β n+1 depending on whether the truncation falls inside and α block or a β block. In both cases, this is a constant multiple of φn+1 (see Theorem 3.9). Like in [47] this can be transformed into [φ0(z), . . . , φn(z)] ( z − ζ̃An(Gn) ) = [0, . . . , 0, cn+1φn+1(z)], which shows that a zero of φn+1 corresponds to an eigenvalue of Vn = ζ̃An(Gn). Thus the following theorem is still valid for a general γ. Theorem 10.3. For the general sequence γ let Gn be the truncation of size n+1 of the matrix G, and define Vn = ζ̃An(Gn), then the eigenvalues of Vn will be the zeros of φn+1. The (left) eigenvector corresponding to an eigenvalue ξ is given by Φn(ξ) = [φ0(ξ), . . . , φn(ξ)]. For the unitary truncation, we first form a unitary truncation Gun of Gn by keeping all Gk with k < n + 1 and replacing Gn+1 by Gun+1 = diag(1, . . . , 1, τ) for some τ ∈ T. We shall then get (recall the definition of the PORF Qn+1(z, τ) in (6.1)) [φ0, . . . , φn] ( $∗An −$AnG u n ) = [0, . . . , 0, Xn+1], with Xn+1 of the form cn+1Q α n+1(z, τ)Ḃβ n+1(z) or cn+1Q α n+1(z, τ)Ḃβ n+1(z) depending on whether the truncation falls inside and α block or a β block. By Corollary 6.2 this is proportional to Qn+1(z, τ ′) in both cases for some appropriate choice of τ ′. Therefore eigenvalues of Un = ζ̃An(Gun) will give the nodes of the (n + 1)-point rational Szegő quadrature formula for some particular choice of the τ -parameter. Again this is worked out in detail in [47] for the Hessenberg and CMV matrices. As in the previous theorem, also for the unitary truncations we can use similar arguments for a general sequence and we therefore have also Theorem 10.4. For the general sequence γ let Gun be the unitary truncation of size n + 1 of the matrix G, and define Un = ζ̃An(Gun), then the eigenvalues of Un will be the zeros of the PORF Qn+1 for some value of the τ -parameter. The (left) eigenvector corresponding to an eigenvalue ξ is given by Φn(ξ) = [φ0(ξ), . . . , φn(ξ)]. 36 A. Bultheel, R. Cruz-Barroso and A. Lasarow Remark 10.5. Since Un is unitary, its right eigenvectors will be the conjugate transpose of the left eigenvectors. Thus if F is the matrix whose entry at position (i, j) is φj(ξi), and Λ = diag(ξ0, . . . , ξn) contains the eigenvalues of Un, then FUn = ΛF . A row of F is [φ0(ξi), . . . , φn(ξi)] which is not normalized in `2 sense, but we know that φ0 = 1. In linear algebra it is more usual to write the eigenvalue decomposition of a unitary (or normal) matrix Un as U∗Un = ΛU∗ with U unitary, thus with orthonormal rows and columns. Thus if D is the diagonal matrix whose diagonal entries are the elements on the first row of U (which in our case are all nozero) then F = D−1U∗. So FF ∗ = D−1U∗UD−∗ = D−1D−∗ is diagonal with entries n∑ k=0 |φk(ξi)|2. In other words, the weights wi of the quadrature formula are given by the modulus square of the elements in the first row of U . Note that the full matrices never need to be constructed explicitly. There exist efficient algorithms to perform all the required operations when the matrix is stored as a product of elementary 2×2 factors (for example [3] or [16]). In fact these techniques will be able to handle an arbitrary order in the product with the same complexity (in terms of floating point operations) as for example the Hessenberg or CMV ordering. It will however reduce the programming complexity slightly requiring less bookkeeping, if we use for example a CMV structure. In the case of the CMV matrix, we do not even need a product at all to explicitly write down the matrices involved. Indeed, since the odd and even factors Cuon and Cuen of the unitarily truncated CMV matrix Cun are just block diagonals with all blocks of size at most 2 × 2 with exception of Gn+1 which is diagonal. Hence a pencil like (AnCu∗en + Cuon, Cu∗en + A∗nCuen) that will allow to compute the quadrature is extremely simple for numerical computations [7, Theorem 7.3]. We mentioned already in Section 7 that an arbitrary sequence γ did not give rise to new quadrature formulas than already given by the sequence α or the alternating sequence ε. We now find that there is no computational advantage either it is not very rewarding in a practical computation of rational Szegő quadrature to work out all the details in the general case. The CMV matrix obviously has the smallest possible bandwidth; it is conceptually simpler and somewhat easier to implement. There is however a remarkable linear algebra side effect that became clear from the previous analysis. As far as we know, this was previously unknown in the linear algebra community and we will discussed some preliminary results in the next section. 11 AMPD matrices In the previous section, it was recalled from [47] that under the appropriate conditions, the spectrum of the unitary shift Tµ can be recovered from the matrix representation ζ̃A(Gα) with respect to the basis [ϕα0 , ϕ α 1 , . . .], whatever the sequence γ is. Now the matrix Gα is a product of elementary matrices that differ from the identity only by one 2 × 2 block of the general form (9.1). The only difference between the Hessenberg and the CMV form is the order in which these factors are multiplied. Something of that form was known for a finite unitary Hessenberg matrix. It can be decom- posed as a product of unitary matrices of the type (9.1) and multiplying these factors in any order will not change the spectrum of the matrix (see, e.g., [3]). Since the zeros of the ORFs are given as the eigenvalues of the matrix ζ̃An(Gn) where Gn is either a finite Hessenberg or a CMV matrix and hence they differ only by the order of the elementary factors, we might expect that if Gn is any matrix written as the product of n elementary factors of the form (9.1), then the spectrum of ζ̃An(Gn) will not change if we change the order of the elementary factors in the product. This is a consequence of the fact that the spectrum of Gn will not depend on the order of its elementary factors. Orthogonal Rational Functions on the Unit Circle 37 Since this section is partly detached from the ORF theory, the notation used is also local and should not be confused with the same notation used in other sections, unless stated explicitly. We define an AMPD matrix as a matrix of the form AM +D where A and D are diagonal matrices and M is the product of G-matrices that are matrices that differ from the identity by one 2× 2 diagonal block like (I0 is the empty matrix) Gk :=  Ik−1 0 0 0 0 αk βk 0 0 γk δk 0 0 0 0 In−k  , 1 ≤ k ≤ n, (11.1) which is more general than the matrices defined in (9.1). The purpose is first to show that the spectrum of an AMPD matrix does not depend on the order in which the G-matrices are multiplied. The next proof was provided by Marc Van Barel in private discussion with the first author. Lemma 11.1. Suppose M ′ ∈ Cn×n is an arbitrary matrix with n ≥ 2 and set M = [ M ′ 0 0 1 ] and G = In−1 0 0 0 α β 0 γ δ  and let A,D ∈ C(n+1)×(n+1) be two arbitrary diagonal matrices. Then the determinant and the eigenvalues of AGM +D and AMG+D are the same. Proof. Let us write M ′ =  M ′′ c r m  and A = diag(A′′, a′, a), D = diag(D′′, d′, d). Then AGM +D =  A′′M ′′ +D′′ A′′c 0 a′αr a′αm+ d′ a′β aγr aγm aδ + d  and AMG+D =  A′′M ′′ +D′′ αA′′c βA′′c a′r a′αm+ d′ a′βm 0 aγ aδ + d  . Taking the determinant of AGM +D gives (expand along the last column) det(AGM +D) = (aδ + d)a′α det M̃ + (aδ + d)d′ det(A′′M ′′ +D′′)− a′βaγ det M̃ = aa′(αδ − βγ) det M̃ + a′dα det M̃ + (aδ + d)d′ det(A′′M ′′ +D′′) = [aa′ det(G) + a′dα] det M̃ + (aδ + d)d′ det(A′′M ′′ +D′′) 38 A. Bultheel, R. Cruz-Barroso and A. Lasarow with M̃ a matrix independent of G, given by M̃ = A′′M ′′ +D′′ A′′c r m  = diag(A′′, 1)M ′ + diag(D′′, 0). Evaluating the determinant of AMG+D in a similar way, expanding along the last row, gives exactly the same result. Hence the determinants are the same: det(AGM+D) = det(AMG+D). Note that D − λI is still a diagonal if D is, so that if we replace D by D − λI in the de- terminants, we obtain the characteristic polynomials of the matrices AGM +D and AMG+D respectively. Equality of these polynomials implies the equality of their roots, i.e., of the eigen- values. � In fact the result is more general since we could replace D as well as A by any diagonal matrices that are functions of λ and the determinants would still be the same. It is interesting to note that A′′M ′′ +D′′ is the same as the matrix M̃ in which the last row and the last column are deleted. This will be used in Theorem 11.3 below. Example 11.2. If in the previous lemma we set M = G1 = α1 β1 0 γ1 δ1 0 0 0 1  and G = G2 = 1 0 0 0 α2 β2 0 γ2 δ2  , then for arbitrary diagonal matrices A = diag(a1, a2, a3) and D = diag(d1, d2, d3) a direct evaluation of the matrices AG1G2 and AG2G1 gives AG1G2 = a1α1 a1β1α2 a1β1β2 a2γ1 a2δ1α2 a2δ1β2 0 a3γ2 a3δ2  and AG2G1 =  a1α1 a1β1 0 a2α2γ1 a2α2δ1 a2β2 a3γ2γ1 a3γ2δ1 a3δ2  . Note that the diagonal elements are the same. If we add D to this diagonal, then with the Laplace rule it is immediately seen that det(AG1G2 +D) = det(AG2G1 +D). If π = (π1, π2, . . . , πk) is a permutation of the numbers (1, 2, . . . , k), then with a notation like∏ i∈π Gi where Gi ∈ C(n+1)×(n+1) are square matrices, we mean the product from left to right in the order given by π. Thus∏ i∈π Gi = Gπ1Gπ2 · · ·Gπk . (11.2) Recall that Gi and Gj of the form (11.1) commute whenever |i − j| > 1. This means that of the k! possible permutations of the numbers (1, 2, . . . , k), only 2k−1 different matrices will result when we consider all possible permutations in the product ∏ i∈π Gi. Indeed, once the order is fixed in which the first k − 1 matrices {Gi : i = 1, 2, . . . , k − 1} are multiplied, Gk will have to multiply this product either from the right or from the left depending on whether it is to the left or to the right of Gk−1 in the product (11.2). This is because Gk commutes with all previous Gi, with i < k − 1. Let PNn = [In 0] ∈ Cn×N be the matrix projecting onto Cn. We define for any matrix M ∈ CN×N its truncation to Cn×n as PNn MPN∗n . The following main theorem states that the determinant and hence the eigenvalues of the truncation of an AMPD matrix does not depend on the order in which its Gi factors are multiplied. Orthogonal Rational Functions on the Unit Circle 39 Theorem 11.3. Let π = (π1, π2, . . . , πn) be an arbitrary permutation of (1, 2, . . . , n) and define Mπ = ∏ i∈π Gi with Gi of the form (11.1), let A and D both be arbitrary diagonal matrices of size n+ 1 and finally set P = Pn+1 n . Then det(P (AMπ +D)P ∗) will not depend on π. Proof. The proof is by induction. For n = 2 we need to consider P (AG1G2 + D)P ∗ and P (AG2G1 + D)P ∗. It can be checked with the expressions of AG1G2 and AG2G1 from Exam- ple 11.2 that det(P (AG1G2 +D)P ∗) = det [ a1α1 + d1 a1α2β1 a2γ1 a2α2δ1 + d2 ] and det(P (AG2G1 +D)P ∗) = det [ a1α1 + d1 a1β1 a2α2γ1 a2α2δ1 + d2 ] . Both expressions give the same determinant. For the induction step we recall that Gn commutes with all Gk with k < n − 1 thus ∏ i∈π Gi equals either Gn ( ∏ i∈π′ Gi ) or ( ∏ i∈π′ Gi ) Gn where π′ is the same as π after removing the value n. In the first case with Gn at the beginning (see the proof of Lemma 11.1) P (Mπ)P ∗ = P  M ′′ cn−1 0 αnrn−1 αnmn−1 βn γnrn−1 γnmn−1 δn P ∗ = P  M ′′ cn−1 0 αnrn−1 αnmn−1 0 0 0 1 P ∗. Therefore det[P (AMπ +D)P ∗] = det(ÂαnM̂π′ + D̂) where we introduced for any matrix F the hat-notation to mean F̂ = PFP ∗. Note that ÂαnM̂π′ + D̂ is an AMPD matrix of size n which has only n − 1 factors (αn can be assimilated in Â). By induction it is independent of the permutation π′. The second case, when Gn is at the end of the product, is completely similar. The αn in front of rn−1 will move to cn−1 and we find the same expression. This implies that the determinant is not only independent of π′ but that we can also insert n anywhere in π′ and still get the same result, which proves the induction step. � We have as an immediate consequence that the truncation is not essential. Corollary 11.4. The determinant, and hence also the eigenvalues of an AMPD matrix AMπ+D do not depend on the permutation π. Proof. To prove this for an AMPD matrix of size n with n − 1 factors, consider the AMPD matrix of the previous theorem and assume that Gn is just the identity matrix. It is then clear that the truncated AMPD matrix ÂM̂π + D̂ is equal to an AMPD matrix of size n with M̂π = PMπ′P ∗ with π′ equal to π in which n is removed. By the previous theorem, its determinant is independent of the permutation π′. � As we mentioned earlier, we could have replaced A and D by any two diagonal matrices that are functions of λ and the determinants would still be independent of the permutation π. RAMPD matrices are rational AMPD matrices. They are the product of an AMPD matrix and an inverse of an AMPD matrix. For example R = (AM + C)(BM +D)−1 40 A. Bultheel, R. Cruz-Barroso and A. Lasarow with A, B, C, D all diagonal and M a product of G-factors. Again it can be shown that the spectrum of R is independent of the product order in the matrix M . Theorem 11.5. Let R = (AMπ +C)(BMπ +D)−1 be a RAMPD matrix as defined above, with Mπ = ∏ k∈π Gk, then the eigenvalues of R do not depend on the permutation π. If BMπ + D is not invertible, then a slightly more general formulation is that the pencil (AMπ +C,BMπ +D) has eigenvalues that do not depend on π. Proof. This problem can be conveniently solved by reducing it to the AMPD case. Indeed, the eigenvalues of the pencil can be found by finding the solutions of the characteristic polynomial which is the determinant of the matrix (AMπ + C)− (BMπ +D)λ. It is obvious that this can be rewritten as (A−Bλ)Mπ + (C −Dλ) = A′Mπ +D′ in which A′ = A−Bλ and D′ = C−Dλ are diagonal matrices. By what has been proved above, the determinant of the AMPD matrix A′Mπ +D′ does not depend on the permutation π. � In the context of ORF, the Gk matrices were unitary, thus of the general form (9.1). Note that then Mπ = ∏ k∈π Gk and its unitary truncation will be unitary. Conversely any unitary matrix can be written as a unitary truncation of such a product because by a sequence of Givens transformations it can be brought into a unitary Hessenberg form, which can be written as a product of such factors. All these elementary transforms can be merged into a unitary truncation of a product of n+ 1 factors Gk if the unitary matrix has size n+ 1. The results about the zeros of the para-orthogonal functions suggested that the eigenvalues of the RAMPD matrix Rπ = η−1 A (A+Mπ)(A∗Mπ + I)−1ηA, ηA = (I −A∗A)1/2 (11.3) are independent of the permutation π, and moreover, if U is the unitary matrix of eigenvectors, then the squared absolute values of the entries on the first row of U are also independent of π. This is because these values give the weights of the quadrature whose nodes are the corresponding eigenvalues and since these quadratures are uniquely defined, the weights computed must be the same. With the ORF interpretation of the previous section we can be more precise and prove that the following theorem holds. Theorem 11.6. Let π be a permutation of (1, . . . , n) and Mπ = ∏ k∈π Gk be the product of unitary factors Gk of the form (9.1) with all δk ∈ D. Let A be a diagonal matrix of size n + 1 with all entries in D. Let Rπ be the unitary matrix defined by (11.3) with eigenvalue decomposition Rπ = VπΛπV ∗ π with Vπ the unitary matrix whose columns are the eigenvectors. Then Λπ and the absolute values of all the entries in Vπ are independent of π. Proof. First note that it is sufficient to prove this when Mπ is a unitary truncation of a product of n+ 1 unitary factors. As in Corollary 11.4 it will then also hold for an untruncated product of n unitary factors by choosing factor Gn+1 to be the identity. We shall work with only one type of Gk factors (those of type Gαk ). Using the results of the previous section we can indeed replace all the Gβk by Gαk if we use the basis {ϕαk = Orthogonal Rational Functions on the Unit Circle 41 ς̇βk φk : k = 0, 1, 2, . . .} where ς̇βk = ∏ j∈bk σj : k = 0, 1, 2, . . .. Furthermore we know that row i of Vπ is proportional to [ϕα0 (ξi), . . . , ϕ α n(ξi)] where Λπ = diag(ξ0, . . . , ξn) are the eigenvalues of Rπ. We also know that reordering the entries in π will not change the eigenvalues, but it does change the eigenvectors. Assume for example that π = (1, 2, . . . , n), then all ϕαk = ς̇βk φ α k = φαk , k = 1, . . . , n. If we change π, then some of the αk ∈ an will be replaced by βk = 1/αk. For the k that remain in an, we have ϕαk = ς̇βk φn = ς̇βk φ α k Ḃ β k . The entry ϕαk (ξi) = ς̇βnφαk (ξi) will thus change into ϕαk (ξi)Ḃ β k (ξi), which means a multiplication with a unimodular constant. For other indices k that migrate to bn, the ϕαk becomes ϕαk = ς̇βk φn = ς̇βk φ α∗ k Ḃ β k (see Theorem 3.9). But when evaluated in ξi ∈ T, clearly Ḃβ k (ξi) ∈ T, while also |φαk (ξi)| = |φα∗k (ξi)| because φα∗k (ξi) = Ḃα k (ξi)Ḃ β k (ξi)φ α k∗(ξi) = Ḃα k (ξi)Ḃ β k (ξi)φαk (ξi). This proves the theorem. � Remark 11.7. The condition |δk| < 1 in the previous theorem is important because δk ∈ T can result in a matrix Mπ that is the direct sum of diagonal blocks. Suppose as an example π = (1, 2, . . . , n) so that Mπ is Hessenberg, but if δn ∈ T, then the last column of Mπ will be [0, 0, . . . , 0, δk] ∗ and this will give an eigenvector [0, . . . , 0, 1]∗ and the first entry is obviously zero, so that the link with ϕα0 = 1 cannot be made. In general, diagonal Gk factors result in a reducible Mπ, which means that it is a direct sum of submatrices. This corresponds to a breakdown of the recurrence relation of the ORFs. Even in the case of orthogonal Laurent polynomials, that is when the diagonal matrix A = 0, the refinement of Theorem 11.6 was not observed in [16]. Without our ORF interpretation, a proof purely on linear algebra arguments seems to be difficult. 12 Relation with direct and inverse eigenvalue problems We are convinced that the generalization of the ORFs that was developed in the previous sections can be efficiently used to analyse algorithms that are developed and used to solve direct and inverse eigenvalue problems. A direct problem is: given a matrix, find its eigenvalues and the eigenvectors, which allows for example, as we have explained to compute the quadrature formula, while in an inverse problem, we are given the eigenvalues and the weights of the quadrature and we have to find the matrix. Of course the latter makes only sense if some structure is imposed on the matrix. The problem related to what has been discussed in this paper is the inverse problem for unitary Hessenberg matrices (see [14, Section 8] or much earlier papers like [1, 2, 28]). The methods described there are restricted to orthogonal polynomials. The more general recursion for orthogonal Laurent polynomials of [16] can be used to construct more general snake shaped matrices, including Hessenberg and CMV matrices. The algorithms to solve these problems apply the recurrence relation for the orthogonal (Laurent) polynomials in reverse order, to descend from degree n to degree 0. These inverse problems relate to discrete least squares approximation. This is described in a linearized form in [41], but proper rational approximation with preassigned poles could be applied when not working with vector polynomials but with ORFs on the unit circle. A proposal is made in [42, 43], but the ORFs are different from ours. We explain the difference in Section 12.1 below. The direct methods apply the recursion in the forward direction. The idea is simple: the power method for a matrix A computes vectors vk = Akv for some vector v = v0 and k = 0, 1, 2, . . . , q. This will with proper normalization converge to an eigenvector corresponding to the dominant eigenvalue, while the inverse power method computes v−k = A−kv for k = 0, 1, 2, . . . , p and this will approximate the dominant eigenvalue of A−1 which is the smallest eigenvalue of A. For the spaces spanned by {vk}qk=−p one wants to find an orthogonal basis to project the problem on a space that is usually much smaller than the size of the matrix A. In polynomial terms, 42 A. Bultheel, R. Cruz-Barroso and A. Lasarow this means finding a set of orthogonal (Laurent) polynomials spanning the space generated by {zk}qk=−p. In each step of the algorithm either p or q is increased by 1. We are thus in the case considered in [16], i.e., the sequence γ has only entries 0 and ∞. For a unitary matrix, the eigenvalues are all on the unit circle, and then the poles of the Laurent polynomials 0 and∞ are the points inside and outside the circle that are as far as possible away from the boundary. So one may not expect the best possible approximation of the spectrum when considering a unitary truncation of the Hessenberg or general snake shaped matrix. Therefore it is essential to select poles at strategic places much closer to the circle to put more emphasis on regions where the eigenvalues are more important. The effect of the location of the poles can be observed for example in the numerical experiments reported in [6, 7, 10]. When the matrix is not unitary, then the underlying theory is for polynomials and ORFs that are orthogonal with respect to a measure supported on the real line or a real interval and convergence has been studied based on the asymptotic behaviour of the ORFs and the approximation error (see for example [5, 13]). For the unitary case such a theory is much less developed. One of the reasons may be that the link with the ORFs as explained in this paper is not clear. Part of the reason is that the onset in the literature is slightly different since the Hessenberg matrix is replaced by its inverse and an extra diagonal was introduced. Since a unitary Hessenberg can be written as the product of the elementary Gk factors, its inverse is also the product of (inverse) Gk factors and therefore computationally as simple as a Hessenberg matrix. The study of the inverse of a Hessenberg has a long history. See the many references in [45, 46]. Such an inverse is known to be related to semi-separable matrices. That are matrices in which every rectangular block that can be taken from its lower (or upper) triangular part have a low rank. Several more detailed variants of the definition exist in the literature but a simple variant goes as follows. A matrix is lower (upper-) semi-separable of semi-separability rank r if all submatrices which can be chosen out of the lower (upper) triangular part of the matrix have a rank at most r and rank r is reached for at least one of them. They can be characterized with few (say O(n)) parameters. For a simple example take two vectors v, w ∈ Cn and form a matrix A whose upper triangular part (including the diagonal) is equal to the upper triangular part of vw∗ and whose strictly lower triangular part is equal to the strictly lower triangular part of wv∗, then A is a semi-separable matrix of rank 1 (Hermitian, except for the diagonal which may not be real). The vectors v and w are called its generators. The generators and the ranks for upper and lower triangular parts may differ in general and several other more general definitions exist. The inverse of an invertible lower semi-separable matrix of rank 1 is an upper Hessenberg matrix. Any Hermitian matrix is unitary similar to a tridiagonal matrix, and the inverse of a tridiagonal is again a semi-separable matrix. That explains why the literature for unitary and Hermitian eigenvalue problems and inverse eigenvalue problems are involved with semi-separable (plus-diagonal) matrices. The extra diagonal appears because these papers do not rely on the Möbius transform ζ̃(G) that we used above but they use a less symmetric framework. On the other hand, when the symmetry with respect to the unit circle is lost, the poles can be anywhere in the complex plane except at infinity which means that the important case of (Laurent) polynomials is excluded. What we have developed in this paper is however flexible enough to give up the symmetry with respect to the unit circle too but without excluding the point at infinity. This is illustrated in the next section. In the subsequent one we shall explain why in the literature the semi- separable-plus-diagonal algorithms are used. 12.1 Using the general recursion while dealing with infinity Suppose γ = α, then our derivation given in Sections 8 and 9 shows that Φ(I − A∗z)Ĥα = Φ(zI − A) Orthogonal Rational Functions on the Unit Circle 43 with Ĥα an upper Hessenberg matrix. It is irreducible because the subdiagonal elements are proportional to √ 1− |λ|2. If we go through the derivation, then the same arguments used for the sequence α also apply when we do exactly the same steps using the recursion of Theorem 5.1 for the general sequence γ, where we assume for the moment that γi =∞ does not appear. Then we shall again arrive at the above relation, except that all α’s are replaced by γ’s. Thus (assume for simplicity that the whole sequence is regular) and recall that all |γk| 6= 1 then Φ(I − Γ ∗z)Ĥγ = Φ(zI − Γ ) and zI = (Ĥγ + Γ ) ( I + Γ ∗Ĥγ )−1 (12.1) where Γ = diag(γ0, γ1, . . .) and ηγ = (I − Γ ∗Γ )1/2 in Ĥγ = η−1 γ Hγηγ which is again upper Hessenberg. An important note is in order here. The previous notation is purely formal. The expressions will involve quantities 1 − |λk|2 and 1 − |γk|2 that can be negative so that their square roots, for example in the definition of ηγ is problematic. However, remember the relation (5.1), which was the consequence of the fact that 1 − |λn|2 can only be negative if (1− |γn|2)(1− |γn−1|2) < 0. Therefore bringing the factor ηγ and η−1 γ inside the G-factors will give square roots of positive elements. Just note that[ 1/ √ 1− |γn−1|2 1/ √ 1− |γn|2 ] [ −λnηn1 √ 1− |λn|2√ 1− |λn|2 λnηn1 ] [√ 1− |γn−1|2 √ 1− |γn|2 ] =  −λnηn1 √ (1− |λn|2)(1− |γn|2) (1− |γn−1|2)√ (1− |λn|2)(1− |γn−1|2) (1− |γn|2) λnηn1  and all square roots are taken from positive numbers. To keep the link with what was done for the α sequence, we used the previous notation and we shall continue doing that in the sequel. In any case the representation Ĥγ in this setting is a Hessenberg matrix that can be computed in a proper way and we do not need a more general snake shaped matrix. By doing this we loose the possibility to include γk =∞. In our previous setting this was not a problem because we reduced everything to the α parameters so that γk =∞ was easily dealt with by mapping it to αk = 0 and this fitted seamlessly in the factors I−A∗z and zI−A. In the current general setting however a choice γk =∞ requires the diagonal element $k(z) = 1− γkz to be replaced by −z and similarly $∗k(z) = z − γk will become −1 which means that some of the entries in the I matrix of (12.1) have to be replaced by zero and the second relation does not hold anymore. To avoid this, we can use the strategy that was used before when switching between the formalism for the α and for the β sequence, except that we now use this strategy to switch between a finite and an infinite entry in the γ sequence. Without going through all the details, the duality between α and β can be transferred to a duality between γ = {γ1, γ2, . . .} ⊂ Ĉ \ T and the reciprocal sequence γ̌ = (γ̌1, γ̌2, . . .) with γ̌k = 1/γk, k = 1, 2, . . .. The ORF for the sequence γ we denote as φn and the ORF for the sequence γ̌ we denote as φ̌n. If ν = (ν1, ν2, . . .) is a sequence that picks νk = γk if γk is finite and νk = γ̌k = 0 if γk =∞, then, in analogy with Theorem 3.9, with proper normalization, the ORFs for this ν sequence, which are denoted as φνk will be given by φνn = φnB̌n if νn = γn and φνn = (φn)∗B̌n if νn = γ̌n, where B̌n = n∏ k=1 νk=γ̌k ζνk . 44 A. Bultheel, R. Cruz-Barroso and A. Lasarow First note that as long as we are dealing with finite γk’s we can use the recurrence relation and write the analog of the relation given in Theorem 9.1, just replacing the α’s by γ’s (which by our convention means to remove the superscripts) and write[ $∗n−1φn−1 $nφ ∗ n ] = [ $n−1φ ∗ n−1 $nφn ] Ĝγn (12.2) with Ĝγn = (Nγ n )−1G̃γn(Nγ n ), Nγ n = [( 1− |γn−1|2 )−1/2 0 0 ( 1− |γn|2 )−1/2 ] , G̃γn = σn−1ηn1 [ −λnηn1 √ 1− |λn|2√ 1− |λn|2 λnηn1 ] [ 1 0 0 σn ] . Now, if some γk = ∞ is involved, we shall switch to the γ̌ sequence, which means that we avoid γk = ∞ and replace it by γ̌k = 0. The factor B̌γ n will thus only pick up a ζνk = 1/z each time a γk =∞, hence when νk = 0. B̌n(z) = n∏ j=1 γj=∞ z = ∏ j∈in z = z|in|, where in = {j : γj =∞, 1 ≤ j ≤ n}. Since in the previous relations, as long as γn 6=∞, we have B̌n−1 = B̌n so that it is no problem to write[ $n−1B̌n−1φn−1 $nB̌nφ ∗ n ] = [ $n−1B̌n−1φ ∗ n−1 $nB̌nφn ] Ĝνn with Ĝνn = Ĝγn. If however γn = ∞, then $n = −z and this z-factor we absorb in B̌n. This is to say that $nB̌n−1 = $∗nB̌n. Hence B̌n−1 [ $n−1φ ∗ n−1 $nφn ] Ĝνn = B̌n−1 [ $n−1φn−1 $nφ ∗ n ] = [ $n−1B̌n−1φn−1 $∗nB̌nφ ∗ n ] . (12.3) Now Ĝνn is like Ĝγn but with γn = ∞ replaced by 0. We can proceed exactly as in the case of the α-β duality, but now apply it to the finite-infinite duality. Thus using νk = γk if γk is finite and νk = 0 when γk =∞ and setting Ĝ = η−1 N GηN , ηN = (I −N ∗N )1/2, N = diag(ν0, ν1, ν2, . . .), (12.4) then the analog of (9.9) becomes (all the γi’s denoted explicitly as such are finite values and denoted as ∞ when not finite) γ = (γ0, γ1, . . . , γn−1 ‖ ∞, . . . , ∞| γk ‖ γk+1, . . . , γm−1‖ ∞ , . . . , ∞| γ`‖γ`+1, . . .), ν = (γ0, γ1, . . . , γn−1 ‖ 0 , . . . , 0| γk ‖ γk+1, . . . , γm−1‖ 0 , . . . , 0| γ`‖γ`+1, . . .), Ĝ = ( Ĝγ1Ĝ γ 2 · · · Ĝ γ n−1 )︸ ︷︷ ︸ Ĝ1 γ ( ĜγkĜ ν k−1 · · · Ĝνn )︸ ︷︷ ︸ Ĝ1 ∞ ( Ĝγk+1 · · · Ĝ γ m−1 )︸ ︷︷ ︸ Ĝ2 γ ( Ĝγ` Ĝ ν `−1 · · · Ĝνm )︸ ︷︷ ︸ Ĝ2 ∞ ( Ĝγ`+1 · · · ) = η−1 N ( Gγ1G γ 2 · · ·G γ n−1 )︸ ︷︷ ︸ G1 γ ( GγkG ν k−1 · · ·Gνn )︸ ︷︷ ︸ G1 ∞ ( Gγk+1 · · ·G γ m−1 )︸ ︷︷ ︸ G2 γ ( Gγ`G ν `−1 · · ·Gνm )︸ ︷︷ ︸ G2 ∞ ( Gγ`+1 · · · ) ηN . Thus in the γ blocks we multiply the successive factors Gγi to the right while in an ∞ block we multiply in reversed order like we did in the α-β case. Orthogonal Rational Functions on the Unit Circle 45 Use (12.2) and multiply Φν(I −N ∗z) from the right with Ĝ1 γ to get (note φν0 = φν∗0 = 1 and B̌n−1 = 1)[ $∗0φ0, . . . , $ ∗ n−2φn−2, $n−1φ ∗ n−1B̌n−1 ∣∣$nφ ∗ nB̌n, . . . ] . While multiplying this result with ĜγkĜ ν k−1 · · · Ĝνn+1, we make use of (12.2) and (12.3) to obtain[ $∗0φ0, . . . , $ ∗ n−2φn−2, $n−1φ ∗ n−1B̌n−1 ∣∣ $∗nφnB̌n, $ ∗ n+1φn+1, . . . , $ ∗ k−1φk−1 ∣∣$kφ ∗ k, $k+1φk+1, . . . ] and the remaining multiplication Ĝνn links the γ block to the∞ block, so that after reintroducing the φνi notation:[ $∗0φ ν 0 , . . . , $ ∗ n−1φ ν n−1 ∣∣$∗nφνn, . . . , $∗k−1φ ν k−1 ∣∣$kφ ν∗ k , $k+1φ ν k+1, . . . ] . The next block is again an γ block treated by the product Ĝ2 γ , and one may continue like this to finally get Φν(I −N ∗z)Ĝ = Φν(zI −N ). (12.5) Note that to include γk =∞ in the sequence, we had to give up the Hessenberg structure of Hγ in (12.1) because the Gk factors corresponding to infinite γ’s are multiplied in reverse order. Thus we again end up with a snake shaped matrix G. It like in Fig. 1 with upper Hessenberg blocks for the blocks with finite γ’s and lower Hessenberg blocks for the sequences corresponding to the blocks of infinite γ’s. This might seem to be a relatively complicated way of dealing with a pole at the origin, which was included in a more elegant way in the original setting, but this is the way how it is dealt with in the papers such as [35] and others. The matrix obtained there is an upper Hessenberg matrix with blocks that bulge below the subdiagonal at those places where in our structure of Fig. 1 we also have a lower Hessenberg block. Note that whatever the shape of the matrix is, in the rational case, one should analyse the spectrum of its Möbius transform as in (12.1) or analyse it as a pencil. This can be avoided at the expense of a somewhat more complicated matrix problem in the form of a semi-separable- plus-diagonal matrix. 12.2 Alternative: semi-separable-plus-diagonal matrices If we start from the first relation in (12.1) in which Ĥγ is upper Hessenberg we can see that the nth element on the right, i.e., $∗n(z)φn(z) is written as a linear combination of the elements [φ0, $1φ1, . . . , $n+1φn+1] = [ 1, p1, p2 π1 , p3 π2 , . . . , pn+1 πn ] , φk = pk πk , pk ∈ Pk. These elements will generate all elements of the form qn+1 πn with qn+1 a polynomial of degree at most n + 1. It is clear that also φn belongs to that space. Hence there must exist an upper Hessenberg matrix H such that Φ(I − Γ ∗z)H = Φ. A finite truncation then is Φn(In − Γ∗nz)Hn = Φn + cn+1[0, . . . , 0, $n+1φn+1] (12.6) 46 A. Bultheel, R. Cruz-Barroso and A. Lasarow with cn+1 a constant, In the identity matrix of size n + 1 and Γ∗n = diag(γ0, . . . , γn). If z is a zero of φn+1, then Φn(In − Γ∗nz)Hn = Φn and this can be rearranged as zΦn = Φn ( In −H−1 n ) Γ̌n with Γ̌n = (Γ∗n)−1 = diag(γ̌0, . . . , γ̌n) where γ̌k = γk∗ = 1/γk. This clearly requires the restrictive conditions that Hn and Γn are invertible, thus γk = 0 is excluded. It shows that z is and eigenvalue of the matrix (In − H−1 n )Γ̌n and that the left eigenvector is [φ0(z), . . . , φn(z)]. By a similar argument if Hn is replaced by Hun, a unitary truncation of H then the eigenvalue z will be on T. This is the kind of arguments used in for example [42, 43, 36] and others. They work with a slightly simplified formula because the ORF have denominators π∗n instead of our πn. If poles at 0 and ∞ are excluded, this is simple to obtain. Suppose φ̌n = pn π∗n , n = 0, 1, . . ., then (12.6) becomes Φ̌n ( zIn − Γ̌n ) Ȟn = Φ̌n + čn+1 [ 0, . . . , 0, $n+1φ̌n+1 ] . Thus if ž is a zero of φ̌n+1, then žΦ̌n = Φ̌n ( Ȟ−1 n + Γ̌n ) . This shows that ž is an eigenvalue of Ȟ−1 n +Γ̌n and the corresponding left eigenvector is Φ̌n(ž) = [φ̌0(ž), . . . , φ̌n(ž)]. This Ȟ−1 n + Γ̌n is the semi-separable-plus-diagonal matrix that we mentioned before. This is the relation given in [43, Theorem 2.7] where it was obtained in the context of an inverse eigenvalue problem. See also [21, 42] which also rely on this structure. The matrix Ȟ−1 n is a semi-separable matrix, as the inverse of an unreducible Hessenberg matrix, which is as structured as the Hessenberg matrix itself. Indeed, if Ȟn can be written as a product of unitary Gk factors, except for the last one which is truncated, then the semi- separable H−1 n is the product of G−1 k in reverse order. Obviously, this has computationally the same complexity. Since this approach allows only for finite poles, this theory was later extended in [36] which is basically using the technique explained in the previous section to include ∞ in the algorithm. The Hessenberg matrix with bulges below the diagonal is then called an extended Hessenberg matrix and the associated Krylov algorithms are called extended Krylov methods. A selection of poles in an arbitrary order is implemented in this way to solve inverse and direct eigenvalue problems, QR factorization of unitary (Hessenberg) matrices, rational Krylov type methods, rational Szegő quadrature, and rational approximation problems like in [3, 22, 26, 35, 36, 37, 38]. We hope that future publications will illustrate that with the approach from the first 10 sections of this paper we have provided a more elegant framework to solve all these problems. 13 Conclusion We have systematically developed the basics about orthogonal rational functions on the unit circle whose poles can be freely chosen anywhere in the complex plane as long as they are not on the unit circle T = {z ∈ C : |z| = 1}. The traditional cases of all poles outside the closed disk and the so-called balanced situations where the poles alternate between inside and outside the disk are included as special instances. The case where all the poles are inside the disk is somewhat less manageable because it is not so easy to deal with the poles at infinity in this formalism, but it has been included anyway. The link between the ORF with all poles outside, all poles inside, and the general case with arbitrary location of the poles is clearly outlined. It is important that Orthogonal Rational Functions on the Unit Circle 47 previous attempts to consider the general situation were assuming that once some pole 1/γ is chosen, then γ may not appear anywhere in the sequence of remaining poles, which would for example exclude the balanced situation. We have shown how the classical Christoffel–Darboux formula, the recurrence relation, and the relation to rational Szegő formulas can be generalized. Finally we analyzed the matrix representation of the shift operator with respect to the general basis as a matrix Möbius transform of a generalization of a snake-shape matrix, and how the latter can be factorized as a product of elementary unitary 2× 2 blocks. It is shown that there is no theoretical or computational advantage to compute rational Szegő quadrature formulas for a general sequence of poles. It is however conceptually slightly simpler to consider the balanced situation, i.e., the generalization of the CMV representation. In the last two sections we have made a link with the linear algebra literature. In Section 11 we have given an illustration by proving some properties of AMPD matrices that are directly inspired by the ORF recursion. A relation with direct and inverse eigenvalue problems for unitary matrices and the related rational Krylov methods is briefly discussed in Section 12 illustrating that the approach given in this paper is more general and hence might therefore be a more appropriate choice. As a final note we remark that a very similar approach can be taken for ORF on the real line or a real interval in which case the unitary Hessenberg matrices will be replaced by tridiagonal Jacobi matrices and their generalizations in case poles are introduced in arbitrary order taken from anywhere in the complex plane excluding the real line or outside the interval are in order. Acknowledgements We thank the anonymous referees for their careful reading of the manuscript and their sugges- tions for improvement. References [1] Ammar G., Gragg W., Reichel L., Constructing a unitary Hessenberg matrix from spectral data, in Numeri- cal Linear Algebra, Digital Signal Processing and Parallel Algorithms (Leuven, 1988), NATO Adv. Sci. Inst. Ser. F Comput. Systems Sci., Vol. 70, Editors G. Golub, P. Van Dooren, Springer, Berlin, 1991, 385–395. [2] Ammar G.S., He C.Y., On an inverse eigenvalue problem for unitary Hessenberg matrices, Linear Algebra Appl. 218 (1995), 263–271. [3] Aurentz J.L., Mach T., Vandebril R., Watkins D.S., Fast and stable unitary QR algorithm, Electron. Trans. Numer. Anal. 44 (2015), 327–341. [4] Beckermann B., Güttel S., Superlinear convergence of the rational Arnoldi method for the approximation of matrix functions, Numer. Math. 121 (2012), 205–236. [5] Beckermann B., Güttel S., Vandebril R., On the convergence of rational Ritz values, SIAM J. Matrix Anal. Appl. 31 (2010), 1740–1774. [6] Bultheel A., Cantero M.-J., A matricial computation of rational quadrature formulas on the unit circle, Numer. Algorithms 52 (2009), 47–68. [7] Bultheel A., Cantero M.-J., Cruz-Barroso R., Matrix methods for quadrature formulas on the unit circle. A survey, J. Comput. Appl. Math. 284 (2015), 78–100. [8] Bultheel A., Dı́az-Mendoza C., González-Vera P., Orive R., On the convergence of certain Gauss-type quadrature formulas for unbounded intervals, Math. Comp. 69 (2000), 721–747. [9] Bultheel A., González-Vera P., Hendriksen E., Nj̊astad O., Orthogonal rational functions, Cambridge Mono- graphs on Applied and Computational Mathematics, Vol. 5, Cambridge University Press, Cambridge, 1999. [10] Bultheel A., González-Vera P., Hendriksen E., Nj̊astad O., Computation of rational Szegő–Lobatto quadra- ture formulas, Appl. Numer. Math. 60 (2010), 1251–1263. [11] Bultheel A., González-Vera P., Hendriksen E., Nj̊astad O., Rational quadrature formulas on the unit circle with prescribed nodes and maximal domain of validity, IMA J. Numer. Anal. 30 (2010), 940–963. [12] Cantero M.J., Moral L., Velázquez L., Minimal representations of unitary operators and orthogonal poly- nomials on the unit circle, Linear Algebra Appl. 408 (2005), 40–65, math.CA/0405246. https://doi.org/10.1016/0024-3795(93)00188-6 https://doi.org/10.1016/0024-3795(93)00188-6 https://doi.org/10.1007/s00211-011-0434-8 https://doi.org/10.1137/090755412 https://doi.org/10.1137/090755412 https://doi.org/10.1007/s11075-008-9257-9 https://doi.org/10.1016/j.cam.2014.11.002 https://doi.org/10.1090/S0025-5718-99-01107-2 https://doi.org/10.1017/CBO9780511530050 https://doi.org/10.1017/CBO9780511530050 https://doi.org/10.1016/j.apnum.2010.05.009 https://doi.org/10.1093/imanum/drn073 https://doi.org/10.1016/j.laa.2005.04.025 https://arxiv.org/abs/math.CA/0405246 48 A. Bultheel, R. Cruz-Barroso and A. Lasarow [13] Chesnokov A., Deckers K., Van Barel M., A numerical solution of the constrained weighted energy problem, J. Comput. Appl. Math. 235 (2010), 950–965. [14] Chu M.T., Golub G.H., Structured inverse eigenvalue problems, Acta Numer. 11 (2002), 1–71. [15] Cruz-Barroso R., Daruis L., González-Vera P., Nj̊astad O., Sequences of orthogonal Laurent polynomials, bi-orthogonality and quadrature formulas on the unit circle, J. Comput. Appl. Math. 200 (2007), 424–440. [16] Cruz-Barroso R., Delvaux S., Orthogonal Laurent polynomials on the unit circle and snake-shaped matrix factorizations, J. Approx. Theory 161 (2009), 65–87, arXiv:0712.2738. [17] Cruz-Barroso R., Dı́az Mendoza C., Perdomo-Ṕıo F., Szegő-type quadrature formulas, J. Math. Anal. Appl. 455 (2017), 592–605. [18] Cruz-Barroso R., González-Vera P., A Christoffel–Darboux formula and a Favard’s theorem for orthogonal Laurent polynomials on the unit circle, J. Comput. Appl. Math. 179 (2005), 157–173. [19] de la Calle Ysern B., González-Vera P., Rational quadrature formulae on the unit circle with arbitrary poles, Numer. Math. 107 (2007), 559–587. [20] Dı́az-Mendoza C., González-Vera P., Jiménez-Paiz M., Strong Stieltjes distributions and orthogonal Laurent polynomials with applications to quadratures and Padé approximation, Math. Comp. 74 (2005), 1843–1870. [21] Fasino D., Gemignani L., Direct and inverse eigenvalue problems for diagonal-plus-semiseparable matrices, Numer. Algorithms 34 (2003), 313–324. [22] Fasino D., Gemignani L., Structured eigenvalue problems for rational Gauss quadrature, Numer. Algorithms 45 (2007), 195–204. [23] Fritzsche B., Kirstein B., Lasarow A., Orthogonal rational matrix-valued functions on the unit circle, Math. Nachr. 278 (2005), 525–553. [24] Fritzsche B., Kirstein B., Lasarow A., Orthogonal rational matrix-valued functions on the unit circle: re- currence relations and a Favard-type theorem, Math. Nachr. 279 (2006), 513–542. [25] Fritzsche B., Kirstein B., Lasarow A., Para-orthogonal rational matrix-valued functions on the unit circle, Oper. Matrices 6 (2012), 631–679. [26] Gemignani L., A unitary Hessenberg QR-based algorithm via semiseparable matrices, J. Comput. Appl. Math. 184 (2005), 505–517. [27] Golub G.H., Meurant G., Matrices, moments and quadrature with applications, Princeton Series in Applied Mathematics, Princeton University Press, Princeton, NJ, 2010. [28] Gragg W.B., Positive definite Toeplitz matrices, the Arnoldi process for isometric operators, and Gaussian quadrature on the unit circle, J. Comput. Appl. Math. 46 (1993), 183–198. [29] Güttel S., Rational Krylov approximation of matrix functions: numerical methods and optimal pole selection, GAMM-Mitt. 36 (2013), 8–31. [30] Güttel S., Knizhnerman L., A black-box rational Arnoldi variant for Cauchy–Stieltjes matrix functions, BIT 53 (2013), 595–616. [31] Jones W.B., Nj̊astad O., Thron W.J., Two-point Padé expansions for a family of analytic functions, J. Com- put. Appl. Math. 9 (1983), 105–123. [32] Jones W.B., Thron W.J., Nj̊astad O., Orthogonal Laurent polynomials and the strong Hamburger moment problem, J. Math. Anal. Appl. 98 (1984), 528–554. [33] Knizhnerman L., Simoncini V., A new investigation of the extended Krylov subspace method for matrix function evaluations, Numer. Linear Algebra Appl. 17 (2010), 615–638. [34] Lasarow A., Aufbau einer Szegő-Theorie für rationale Matrixfunktionen, Ph.D. Thesis, Universität Leipzig, Fak. Mathematik Informatik, 2000. [35] Mach T., Van Barel M., Vandebril R., Inverse eigenvalue problems linked to rational Arnoldi, and rational, (non)symmetric Lanczos, 2013, Tech. Rep. TW629, Dept. Computer Science, KU Leuven. [36] Mach T., Van Barel M., Vandebril R., Inverse eigenvalue problems for extended Hessenberg and extended tridiagonal matrices, J. Comput. Appl. Math. 272 (2014), 377–398. [37] Mertens C., Short recurrence relations for (extended) Krylov subspaces, Ph.D. Thesis, Department of Com- puter Science, Faculty of Engineering Science, KU Leuven, 2016, available at https://lirias.kuleuven. be/handle/123456789/538706. [38] Mertens C., Vandebril R., Short recurrences for computing extended Krylov bases for Hermitian and unitary matrices, Numer. Math. 131 (2015), 303–328. https://doi.org/10.1016/j.cam.2009.11.060 https://doi.org/10.1017/S0962492902000016 https://doi.org/10.1016/j.cam.2006.01.008 https://doi.org/10.1016/j.jat.2008.08.004 https://arxiv.org/abs/0712.2738 https://doi.org/10.1016/j.jmaa.2017.05.055 https://doi.org/10.1016/j.cam.2004.09.039 https://doi.org/10.1007/s00211-007-0102-1 https://doi.org/10.1090/S0025-5718-05-01763-1 https://doi.org/10.1023/B:NUMA.0000005402.66868.af https://doi.org/10.1007/s11075-007-9082-6 https://doi.org/10.1002/mana.200310257 https://doi.org/10.1002/mana.200310257 https://doi.org/10.1002/mana.200310376 https://doi.org/10.7153/oam-06-44 https://doi.org/10.1016/j.cam.2005.01.024 https://doi.org/10.1016/j.cam.2005.01.024 https://doi.org/10.1016/0377-0427(93)90294-L https://doi.org/10.1002/gamm.201310002 https://doi.org/10.1007/s10543-013-0420-x https://doi.org/10.1016/0377-0427(83)90034-1 https://doi.org/10.1016/0377-0427(83)90034-1 https://doi.org/10.1016/0022-247X(84)90267-1 https://doi.org/10.1002/nla.652 https://doi.org/10.1016/j.cam.2014.03.015 https://lirias.kuleuven.be/handle/123456789/538706 https://lirias.kuleuven.be/handle/123456789/538706 https://doi.org/10.1007/s00211-014-0692-3 Orthogonal Rational Functions on the Unit Circle 49 [39] Nj̊astad O., Santos-León J.C., Domain of validity of Szegö quadrature formulas, J. Comput. Appl. Math. 202 (2007), 440–449. [40] Thron W.J., L-polynomials orthogonal on the unit circle, in Nonlinear Numerical Methods and Rational Approximation (Wilrijk, 1987), Math. Appl., Vol. 43, Reidel, Dordrecht, 1988, 271–278. [41] Van Barel M., Bultheel A., Discrete linearized least-squares rational approximation on the unit circle, J. Comput. Appl. Math. 50 (1994), 545–563. [42] Van Barel M., Fasino D., Gemignani L., Mastronardi N., Orthogonal rational functions and diagonal-plus- semiseparable matrices, in Advanced Signal Processing Algorithms, Architectures, and Implementations XII, Proceedings of SPIE , Vol. 4791, Editor F. Luk, SPIE, 2002, 162–170. [43] Van Barel M., Fasino D., Gemignani L., Mastronardi N., Orthogonal rational functions and structured matrices, SIAM J. Matrix Anal. Appl. 26 (2005), 810–829. [44] Van Beeumen R., Meerbergen K., Michiels W., A rational Krylov method based on Hermite interpolation for nonlinear eigenvalue problems, SIAM J. Sci. Comput. 35 (2013), A327–A350. [45] Vandebril R., Van Barel M., Mastronardi N., Matrix computations and semiseparable matrices, Vol. 1, Linear systems, Johns Hopkins University Press, Baltimore, MD, 2008. [46] Vandebril R., Van Barel M., Mastronardi N., Matrix computations and semiseparable matrices, Vol. 2, Eigenvalue and singular value methods, Johns Hopkins University Press, Baltimore, MD, 2008. [47] Velázquez L., Spectral methods for orthogonal rational functions, J. Funct. Anal. 254 (2008), 954–986, arXiv:0704.3456. https://doi.org/10.1016/j.cam.2006.02.039 https://doi.org/10.1007/978-94-009-2901-2_16 https://doi.org/10.1016/0377-0427(94)90327-1 https://doi.org/10.1117/12.453815 https://doi.org/10.1137/S0895479803444454 https://doi.org/10.1137/120877556 https://doi.org/10.1016/j.jfa.2007.11.004 https://arxiv.org/abs/0704.3456 1 Introduction 2 Basic definitions and notation 3 Linear spaces and ORF bases 4 Christoffel–Darboux relations and reproducing kernels 5 Recurrence relation 6 Para-orthogonal rational functions 7 Quadrature 8 Block diagonal with Hessenberg blocks 9 Factorization of a general CMV matrix 10 Spectral analysis 11 AMPD matrices 12 Relation with direct and inverse eigenvalue problems 12.1 Using the general recursion while dealing with infinity 12.2 Alternative: semi-separable-plus-diagonal matrices 13 Conclusion References