CS 462/662: Open Problems

Here is a list of open problems. Solve any one of these, and get an automatic 100 for the course!

Problem 1

A nonempty word z is primitive if it cannot be written in the form xi for an integer i ≥ 2. For example, abac is primitive, but abab is not, since it can be written as (ab)2.

Thus a primitive word is essentially a non-power.

Let P2 = { x ∈ {0,1}* : x is primitive }. It is not hard, and makes a good exercise, to prove that language P2 is not regular.

However, the following problem is open: Is P2 context-free?

Most people believe the answer is no, but no one has a proof. The problem has been open for about 25 years.

By contrast, it is possible to prove that the language of non-squares is context-free. This also makes a good exercise!

Problem 2

In Section 2.6.2 we proved a result about infinite products where the constant Q = Πn ≥ 1 (2n/(2n+1))(-1) tn was involved. Here tn is the nth term of the Thue-Morse sequence.

Can Q be expressed in terms of known constants? Is it transcendental?

Problem 3

Consider x, the lexicographically least squarefree infinite word over the alphabet {0, 1, 2}. Then x begins 01020120210120102012 ... .

(a) Does every squarefree finite word appear as a subword (factor) of x?

(b) Do the letters occur with limiting frequencies in x, and if so, what are the frequencies?

Problem 4

Describe the distribution of sizes of the minimal DFA equivalent to an NFA of n states, say over the binary alphabet.

Problem 5

Describe the distribution of sizes of the minimal DFA obtained by minimizing the "reversed" automaton (as discussed in class), over all DFA's with n states. For example, can we get every number of states between n and 2n?

Problem 6

Does every power of 16 greater than 164 have a digit 1, 2, 4, or 8 when written in base 10?

Problem 7

What is the state complexity of the transformation that takes a language L to (c(L*))*? Here by c(A) I mean the complement of A. I am looking for good upper and lower bounds. It is easy to see that if L has state complexity n, then (c(L*))* has state complexity at most 22n, but can anything close to this occur?

Update, March 22 2012: solved! by Slovakian mathematician Galina Jirásková. The bound is 2Θ(n log n).

Problem 8

Given CFL's L1 and L2 with L1L2 and L2 - L1 infinite, is there always a CFL L such that L1LL2 such that L2 - L and L - L1 are both infinite? This is not known, even in the case where L2 = Σ*.

Problem 9

An old problem due to Pirillo & Varricchio and Halbeisen & Hungerbühler (independently): is there an infinite word over a finite subset of N, the natural numbers, having no two consecutive blocks of the same length and same sum? In other words, can we avoid all subwords of the form xy with |x| = |y| and Σ x = Σ y?

It is known that it is possible to avoid three consecutive blocks of the same length and same sum. See this paper.

Problem 10

Consider the Kolakoski word K = 1221121 ... defined by alternating runs of 1's and 2's, where the length of each run is the sequence K itself. Alternately K can be generated by running 12 over and over through a simple transducer with 2 states.

Prove or disprove that the limiting frequency of the symbol 1 in K exists and is equal to 1/2. The definition of limiting frequency is the limit limn → ∞ | K[1..n] |1/n.

Problem 11

In Theorem 3.8.5 we showed how to find all the strings accepted by an n-state unary acyclic NFA in O(ne log n) time, where e is the optimal exponent for matrix multiplication (currently e = 2.373).

Is there an algorithm to find all the strings accepted by an n-state unary acyclic NFA in O(n2) time?

Problem 12

Is there a prime of the form 8 · 134i + 183 for i ≥ 1? Resolving this would allow a classification of the minimal elements of the primes expressed in base 13.

Tentatively solved! Curtis Bright has bound that for i = 8005, the number is a "probable prime". That is to say, it passed about 12 Miller-Rabin tests. It remains to complete a formal proof for this exponent.

Problem 13

Let p be a polynomial of degree ≥ 2 such that p(N) ⊆ N. Let k be an integer ≥ 2. Prove or disprove: for all such p and k, { (p(n))k : n ≥ 0 } is not a CFL.