## CS 462/662: Open Problems

Here is a list of open problems. Solve any one of these, and get an automatic 100 for the course! Note that some problems listed here include an update indicating that they are no longer open. Naturally, solving any of those problems will not suffice.

### Problem 1

A nonempty word z is primitive if it cannot be written in the form xi for an integer i ≥ 2. For example, abac is primitive, but abab is not, since it can be written as (ab)2. Thus, a primitive word is essentially a non-power.

Let P2 = { x ∈ {0,1}* : x is primitive }. It is not hard, and makes a good exercise, to prove that language P2 is not regular. However, the following problem is open: Is P2 context-free?

Most people believe the answer is no, but no one has a proof. This very hard problem has been open for about 25 years. (There's even a new 500-page book about this problem, Context-free languages and primitive words, by Dömösi and Ito.) By contrast, it is possible to prove that the language of non-squares is context-free. This also makes a good exercise!

### Problem 2

In Section 2.6.2 we proved a result about infinite products where the constant Q = Πn ≥ 1 (2n/(2n+1))(-1) tn ≐ 1.6281601297 was involved. Here tn is the nth term of the Thue-Morse sequence.

Can Q be expressed in terms of known constants? Is it irrational? Transcendental?

### Problem 3

Consider x, the lexicographically least squarefree infinite word over the alphabet {0, 1, 2}. Then x begins 01020120210120102012 ... .

(a) Does every squarefree finite word appear as a subword (factor) of x?

(b) Do the letters occur with limiting frequencies in x, and if so, what are the frequencies? (Limiting frequencies are defined below in problem 10.)

### Problem 4

Describe the distribution of sizes of the minimal DFA equivalent to an NFA of n states, say over the binary alphabet.

### Problem 5

Describe the distribution of sizes of the minimal DFA obtained by minimizing the "reversed" automaton (as discussed in class), over all DFA's with n states. For example, can we get every number of states between n and 2n?

### Problem 6

Does every power of 16 greater than 164 have a digit 1, 2, 4, or 8 when written in base 10?

### Problem 7

What is the state complexity of the transformation that takes a language L to (c(L*))*? Here by c(A) I mean the complement of A. I am looking for good upper and lower bounds. It is easy to see that if L has state complexity n, then (c(L*))* has state complexity at most 22n, but can anything close to this occur?

Update, March 22 2012: solved! by Slovak computer scientist Galina Jirásková. The bound is 2Θ(n log n).

### Problem 8

Given CFL's L1 and L2 with L1L2 and L2 - L1 infinite, is there always a CFL L such that L1LL2 such that L2 - L and L - L1 are both infinite? This is not known, even in the case where L2 = Σ*.

Update, 2015: Solved!

### Problem 9

An old problem due to Pirillo & Varricchio and Halbeisen & Hungerbühler (independently): is there an infinite word over a finite subset of N, the natural numbers, having no two consecutive blocks of the same length and same sum? In other words, can we avoid all subwords of the form xy with |x| = |y| and Σ x = Σ y?

(It is known that it is possible to avoid three consecutive blocks of the same length and same sum. See this paper.)

### Problem 10

Consider the Oldenburger-Kolakoski word K = 1221121 ... defined by alternating runs of 1's and 2's, where the length of each run is the sequence K itself. Alternately K can be generated by running 12 over and over through a simple transducer with 2 states.

Prove or disprove that the limiting frequency of the symbol 1 in K exists and is equal to 1/2. The definition of limiting frequency is the limit limn → ∞ | K[1..n] |1/n.

### Problem 11

In Theorem 3.8.5 we showed how to find all the strings accepted by an n-state unary acyclic NFA in O(ne log n) time, where e is the optimal exponent for matrix multiplication (currently e = 2.373).

Is there a significantly faster algorithm? For example, can we find all the strings accepted by an n-state unary acyclic NFA in O(n2) time?

### Problem 12

Is there a prime of the form 8 · 134i + 183 for i ≥ 1? Resolving this would allow a classification of the minimal elements of the primes expressed in base 13.

Tentatively solved! Curtis Bright has found that for i = 8005, the number is a "probable prime". That is to say, it passed about 12 Miller-Rabin tests. It remains to complete a formal proof for this exponent.

### Problem 13

Let p be a polynomial with rational coefficients of degree ≥ 2 such that p(N) ⊆ N. Let k be an integer ≥ 2. Prove or disprove: for all such p and k, { (p(n))k : n ≥ 0 } is not a CFL. Here the notation (x)k means the string representing x in base k.

There is an old manuscript of Sándor Horváth that claims to solve this problem, but to my knowledge it has never been published, and there seem to be gaps in the proof.

### Problem 14

#### The "Curling Number" conjecture

Start with any finite string x made up of positive integers, such as 33232. Look at all the suffixes of x, and determine the largest repetition exponent t. For example, in 33232, there is the repetition 3232, which is y2 for y = 32, so it has repetition exponent 2. Now concatenate t this to the end of x, and continue.

In our example, we get 33232, then 332322, then 3323222, then 33232223, then 332322231.

Conjecture: this process always terminates with 1. For more information, see the paper here.

### Problem 15

Stepan Holub offers 200 Euros for a solution to the following question:

Is there a positive integer n ≥ 2 and nonempty words u1, u2, ..., un such that both of the following equalities hold:

(u1 u2 ... un)2 = u12 u22 ... un2
(u1 u2 ... un)3 = u13 u23 ... un3

and furthermore for at least one pair of indices i, j we have ui ujuj ui?

### Problem 16

Give good upper and lower bounds on the number of distinct languages accepted by NFA's of n states with input alphabet {0, 1} whose deterministic state complexity is 2n. (It is known there are exponentially many.)

### Problem 17

Improve the known upper and lower bounds on the number of distinct languages accepted by unary NFA's of n states. It is known there are constants c, d such that there are at least 2n + c √n/log n languages and at most (dn/log n)n. See here for the upper bound and here for the lower bound.

### Problem 18

Prove or disprove: if an infinite word w has the property that there exists an integer N such that every finite prefix of w can be written as the concatenation of at most N palindromes (counted with repetition), then w is ultimately periodic.

### Problem 19

Let Σk = {0, 1, 2, ..., k - 1}, and let L ⊆ (Σk × Σk)* be a language representing pairs of numbers in base k. Is the following problem decidable (i.e., is there an algorithm for it?): given a DFA M accepting such an L, representing a set S of pairs of integers in base k, does there exist a pair (m, n) in S for which m is divisible by n?

If the "does there exist a pair" is replaced by "is it true that for all pairs", then the answer is known to be decidable.

### Problem 20

The setup is the same as in the previous problem, except now L is a CFL, generated by a grammar G. Now the problem is, given G, with L(G) representing a set of pairs of integers S, is the quantity sup(m, n) ∈ S m/n computable? If L is accepted by a DFA instead, the answer is known to be "yes".

### Problem 21

Let 0 < x < 1 be a real number, and let b, k ≥ 2 be integers. We say x is (b, k)-automatic if there exists a DFA with outputs on the states that, having read the input n expressed in base k, reaches a state with output an, where x = 0. a1 a2 a3 ... in base b.

Is π or e or log 2 or γ (Euler's constant) (b, k)-automatic?

### Problem 25

What are the minimal elements for the positive squares, expressed in base 7? The first few terms are 1,4,22,562,533302,600552,6306532,333653302,3063666352,5335553532,33666603502,60063060352,300605550352,3336066003502,35050033500502,50500553555302,306036333363352,306335355500002,333365503550032,335000000055532,3030606663350302,3030666003005032,3033330063055552, 6303363360363502,... that is, the base-7 representation of the squares of the integers (in base 10) 1,2,4,17,304,318,871,4498,11246,14907,31762,41233,77248,220343,601073,703111,1455619,145 7642,1541012,1547431,3813856,3814073,3818594,5528954,...

Same question for bases 9 and 10. For base 10 this is sequence A130448 in the Online Encyclopedia of Integer Sequences.

### Problem 26

Let P be the language of primes expressed in base 2. It is known and not too hard to prove that P is not regular. How about P*? Is it regular? Context-free? Probably not either, but nobody seems to know how to prove it currently.