CS 462/662: Open Problems

Here is a list of open problems. Solve any one of these, and get an automatic 100 for the course!


Problem 1

A nonempty word z is primitive if it cannot be written in the form xi for an integer i ≥ 2. For example, abac is primitive, but abab is not, since it can be written as (ab)2. Thus, a primitive word is essentially a non-power.

Let P2 = { x ∈ {0,1}* : x is primitive }. It is not hard, and makes a good exercise, to prove that language P2 is not regular. However, the following problem is open: Is P2 context-free?

Most people believe the answer is no, but no one has a proof. This very hard problem has been open for about 25 years. By contrast, it is possible to prove that the language of non-squares is context-free. This also makes a good exercise!


Problem 2

In Section 2.6.2 we proved a result about infinite products where the constant Q = Πn ≥ 1 (2n/(2n+1))(-1) tn ≐ 1.6281601297 was involved. Here tn is the nth term of the Thue-Morse sequence.

Can Q be expressed in terms of known constants? Is it irrational? Transcendental?


Problem 3

Consider x, the lexicographically least squarefree infinite word over the alphabet {0, 1, 2}. Then x begins 01020120210120102012 ... .

(a) Does every squarefree finite word appear as a subword (factor) of x?

(b) Do the letters occur with limiting frequencies in x, and if so, what are the frequencies? (Limiting frequencies are defined below in problem 10.)


Problem 4

Describe the distribution of sizes of the minimal DFA equivalent to an NFA of n states, say over the binary alphabet.


Problem 5

Describe the distribution of sizes of the minimal DFA obtained by minimizing the "reversed" automaton (as discussed in class), over all DFA's with n states. For example, can we get every number of states between n and 2n?


Problem 6

Does every power of 16 greater than 164 have a digit 1, 2, 4, or 8 when written in base 10?


Problem 7

What is the state complexity of the transformation that takes a language L to (c(L*))*? Here by c(A) I mean the complement of A. I am looking for good upper and lower bounds. It is easy to see that if L has state complexity n, then (c(L*))* has state complexity at most 22n, but can anything close to this occur?

Update, March 22 2012: solved! by Slovak computer scientist Galina Jirásková. The bound is 2Θ(n log n).


Problem 8

Given CFL's L1 and L2 with L1L2 and L2 - L1 infinite, is there always a CFL L such that L1LL2 such that L2 - L and L - L1 are both infinite? This is not known, even in the case where L2 = Σ*.

Problem 9

An old problem due to Pirillo & Varricchio and Halbeisen & Hungerbühler (independently): is there an infinite word over a finite subset of N, the natural numbers, having no two consecutive blocks of the same length and same sum? In other words, can we avoid all subwords of the form xy with |x| = |y| and Σ x = Σ y?

(It is known that it is possible to avoid three consecutive blocks of the same length and same sum. See this paper.)


Problem 10

Consider the Oldenburger-Kolakoski word K = 1221121 ... defined by alternating runs of 1's and 2's, where the length of each run is the sequence K itself. Alternately K can be generated by running 12 over and over through a simple transducer with 2 states.

Prove or disprove that the limiting frequency of the symbol 1 in K exists and is equal to 1/2. The definition of limiting frequency is the limit limn → ∞ | K[1..n] |1/n.


Problem 11

In Theorem 3.8.5 we showed how to find all the strings accepted by an n-state unary acyclic NFA in O(ne log n) time, where e is the optimal exponent for matrix multiplication (currently e = 2.373).

Is there a significantly faster algorithm? For example, can we find all the strings accepted by an n-state unary acyclic NFA in O(n2) time?


Problem 12

Is there a prime of the form 8 · 134i + 183 for i ≥ 1? Resolving this would allow a classification of the minimal elements of the primes expressed in base 13.

Tentatively solved! Curtis Bright has found that for i = 8005, the number is a "probable prime". That is to say, it passed about 12 Miller-Rabin tests. It remains to complete a formal proof for this exponent.


Problem 13

Let p be a polynomial with rational coefficients of degree ≥ 2 such that p(N) ⊆ N. Let k be an integer ≥ 2. Prove or disprove: for all such p and k, { (p(n))k : n ≥ 0 } is not a CFL. Here the notation (x)k means the string representing x in base k.

There is an old manuscript of Sándor Horváth that claims to solve this problem, but to my knowledge it has never been published, and there seem to be gaps in the proof.


Problem 14

The "Curling Number" conjecture

Start with any finite string x made up of positive integers, such as 33232. Look at all the suffixes of x, and determine the largest repetition exponent t. For example, in 33232, there is the repetition 3232, which is y2 for y = 32, so it has repetition exponent 2. Now concatenate t this to the end of x, and continue.

In our example, we get 33232, then 332322, then 3323222, then 33232223, then 332322231.

Conjecture: this process always terminates with 1. For more information, see the paper here.


Problem 15

Stepan Holub offers 200 Euros for a solution to the following question:

Is there a positive integer n ≥ 2 and nonempty words u1, u2, ..., un such that both of the following equalities hold:

(u1 u2 ... un)2 = u12 u22 ... un2
(u1 u2 ... un)3 = u13 u23 ... un3

and furthermore for at least one pair of indices i, j we have ui ujuj ui?


Problem 16

Give good upper and lower bounds on the number of distinct languages accepted by NFA's of n states with input alphabet {0, 1} whose deterministic state complexity is 2n. (It is known there are exponentially many.)


Problem 17

Improve the known upper and lower bounds on the number of distinct languages accepted by unary NFA's of n states. It is known there are constants c, d such that there are at least 2n + c √n/log n languages and at most (dn/log n)n. See here for the upper bound and here for the lower bound.


Problem 18

Prove or disprove: if an infinite word w has the property that there exists an integer N such that every finite prefix of w can be written as the concatenation of at most N palindromes (counted with repetition), then w is ultimately periodic.


Problem 19

Let Σk = {0, 1, 2, ..., k - 1}, and let L ⊆ (Σk × Σk)* be a language representing pairs of numbers in base k. Is the following problem decidable (i.e., is there an algorithm for it?): given a DFA M accepting such an L, representing a set S of pairs of integers in base k, does there exist a pair (m, n) in S for which m is divisible by n?

If the "does there exist a pair" is replaced by "is it true that for all pairs", then the answer is known to be decidable.


Problem 20

The setup is the same as in the previous problem, except now L is a CFL, generated by a grammar G. Now the problem is, given G, with L(G) representing a set of pairs of integers S, is the quantity sup(m, n) ∈ S m/n computable? If L is accepted by a DFA instead, the answer is known to be "yes".


Problem 21

Let 0 < x < 1 be a real number, and let b, k ≥ 2 be integers. We say x is (b, k)-automatic if there exists a DFA with outputs on the states that, having read the input n expressed in base k, reaches a state with output an, where x = 0. a1 a2 a3 ... in base b.

Is π or e or log 2 or γ (Euler's constant) (b, k)-automatic?


Problem 22

The "distinguishing strings" problem -- one of the most fundamental problems in automata theory. Given two words w and x, both of length ≤ n, what is the size of the smallest DFA that accepts one and rejects the other? Robson proved that O(n2/5 (log n)3/5) states suffice, and there are examples known where Ω(log n) states are needed. I offer $100 for a significantly improved bound in either direction.


Problem 23

The Endrullis-Hendricks Thue-Morse transducer problem: Consider the Thue-Morse sequence t = t0 t1 t2 ... = 01101001 ... . We can apply a finite-state transducer to t that outputs the sequence c = c1 c2 c3 ... = 010001010 ... defined by ci = 1, if ti = ti-1 and 0 otherwise. Now we can apply another finite-state transducer to c = c1 c2 c3 ... that deletes every third letter, starting with the third, outputting d := c1 c2 c4 c5 c7 c8 ... = 0100011001 ... . Is there a finite-state transducer that, given d as input, outputs t? Nobody knows how to do this currently, but only a few people have thought about it, and it might be easy.

There is a more general question here, which I mentioned in class: suppose you apply a finite-state transducer to t and you get something, say x, that is not ultimately periodic. Can you always apply a finite-state transducer to x to get t back again? Nobody knows.


Problem 24

Consider the following decision problem: given an NFA M of n states, does there an integer r such that ΣrL(M)?

What is its complexity? I can prove it is PSPACE-hard, but is it in PSPACE? I offer $100 for this problem.

Problem 25

What are the minimal elements for the positive squares, expressed in base 7? The first few terms are 1,4,22,562,533302,600552,6306532,333653302,3063666352,5335553532,33666603502,60063060352,300605550352,3336066003502, ..., that is, the squares of the integers 1,2,4,17,304,318,871,4498,11246,14907,31762,41233,77248,220343, ...

Same question for bases 9 and 10. For base 10 this is sequence A130448 in the Online Encyclopedia of Integer Sequences.