CS 462/662: Open Problems

Here is a list of open problems. Come up with an original solution for any of these, and get an automatic 100 for the course!

Note that some problems listed here include an update indicating that they are no longer open. Naturally, solving any of those problems will not suffice.

Problem 1

A nonempty word z is primitive if it cannot be written in the form xi for an integer i ≥ 2. For example, abac is primitive, but abab is not, since it can be written as (ab)2. Thus, a primitive word is essentially a non-power.

Let P2 = { x ∈ {0,1}* : x is primitive }. It is not hard, and makes a good exercise, to prove that language P2 is not regular. However, the following problem is open: Is P2 context-free?

Most people believe the answer is no, but no one has a proof. This very hard problem has been open for about 30 years. (There's even a 500-page book, published in 2014, about this problem, Context-free languages and primitive words, by Dömösi and Ito.) By contrast, it is possible to prove that the language of non-squares is context-free. This also makes a good exercise!

Problem 2

In Section 2.6.2 we proved a result about infinite products where the constant Q = Πn ≥ 1 (2n/(2n+1))(-1) tn ≐ 1.6281601297 was involved. Here tn is the nth term of the Thue-Morse sequence.

Can Q be expressed in terms of known constants? Is it irrational? Transcendental?

Problem 3

Consider x, the lexicographically least squarefree infinite word over the alphabet {0, 1, 2}. Then x begins 01020120210120102012 ... .

(a) Does every squarefree finite word appear as a subword (factor) of x?

(b) Do the letters occur with limiting frequencies in x, and if so, what are the frequencies? (Limiting frequencies are defined below in problem 10.)

Problem 4

Describe the distribution of sizes of the minimal DFA equivalent to an NFA of n states, say over the binary alphabet.

Problem 5

Describe the distribution of sizes of the minimal DFA obtained by minimizing the "reversed" automaton (as discussed in class), over all DFA's with n states. For example, can we get every number of states between n and 2n?

Problem 6

Does every power of 16 greater than 164 have a digit 1, 2, 4, or 8 when written in base 10? A positive answer to this question would resolve the problem of determining the set of "minimal elements" for the language of powers of 2 expressed in base 10. See section 3.12 of the course text.

Problem 7

What is the state complexity of the transformation that takes a language L to (c(L*))*? Here by c(A) I mean the complement of A. I am looking for good upper and lower bounds. It is easy to see that if L has state complexity n, then (c(L*))* has state complexity at most 22n, but can anything close to this occur?

Update, March 22 2012: solved! by Slovak computer scientist Galina Jirásková. The bound is 2Θ(n log n).

Problem 8

Given CFL's L1 and L2 with L1L2 and L2 - L1 infinite, is there always a CFL L such that L1LL2 such that L2 - L and L - L1 are both infinite? This is not known, even in the case where L2 = Σ*.

Problem 9

An old problem due to T. Brown, and Pirillo & Varricchio, and Halbeisen & Hungerbühler, independently: is there an infinite word over a finite subset of N, the natural numbers, having no two consecutive blocks of the same length and same sum? In other words, can we avoid all subwords of the form xy with |x| = |y| and Σ x = Σ y?

(It is known that it is possible to avoid three consecutive blocks of the same length and same sum. See this paper.)

Problem 10

Consider the Oldenburger-Kolakoski word K = 1221121 ... defined by alternating runs of 1's and 2's, where the length of each run is the sequence K itself. Alternately K can be generated by running 12 over and over through a simple transducer with 2 states.

Prove or disprove that the limiting frequency of the symbol 1 in K exists and is equal to 1/2. The definition of limiting frequency is the limit limn → ∞ | K[1..n] |1/n.

Problem 11

In Theorem 3.8.5 we showed how to find all the strings accepted by an n-state unary acyclic NFA in O(ne log n) time, where e is the optimal exponent for matrix multiplication (currently e = 2.373).

Is there a significantly faster algorithm? For example, can we find all the strings accepted by an n-state unary acyclic NFA in O(n2) time?

Solution! (sort of) by Aaron Potechin, February 2017: he shows that the problem of determining whether a graph on n vertices has a triangle reduces to this problem. Since no algorithm faster than ne is known for the triangle problem, and it is well-studied, any solution for the NFA problem would improve the running time for a very well-studied problem.

Problem 12

Is there a prime of the form 8 · 134i + 183 for i ≥ 1? Resolving this would allow a classification of the minimal elements of the primes expressed in base 13.

Tentatively solved! Curtis Bright has found that for i = 8005, the number is a "probable prime". That is to say, it passed about 12 Miller-Rabin tests. It remains to complete a formal proof for this exponent.

Problem 13

Let p be a polynomial with rational coefficients of degree ≥ 2 such that p(N) ⊆ N. Let k be an integer ≥ 2. Prove or disprove: for all such p and k, { (p(n))k : n ≥ 0 } is not a CFL. Here the notation (x)k means the string representing x in base k.

There is an old manuscript of Sándor Horváth that claims to solve this problem, but to my knowledge it has never been published, and there seem to be gaps in the proof.

New! There is another claimed solution by Pálvölgyi Dömötör, posted here. I have not been able to understand it, and I am not completely convinced it is correct. If you read the proof and find an error, please let me know (you will get some extra marks). If you read the proof and are convinced by it, write it down in your own words, again for extra marks.

Problem 14

The "Curling Number" conjecture

Start with any finite string x made up of positive integers, such as 33232. Look at all the suffixes of x, and determine the largest repetition exponent t. For example, in 33232, there is the repetition 3232, which is y2 for y = 32, so it has repetition exponent 2. Now concatenate t this to the end of x, and continue.

In our example, we get 33232, then 332322, then 3323222, then 33232223, then 332322231.

Conjecture: this process always terminates with 1. For more information, see the paper here.

Problem 15

Stepan Holub offers 200 Euros for a solution to the following question:

Is there a positive integer n ≥ 2 and nonempty words u1, u2, ..., un such that both of the following equalities hold:

(u1 u2 ... un)2 = u12 u22 ... un2
(u1 u2 ... un)3 = u13 u23 ... un3

and furthermore for at least one pair of indices i, j we have ui ujuj ui?

Update, April 2017: The problem has been solved (negatively) by Aleksi Saarela.

Problem 16

Give good upper and lower bounds on the number of distinct languages accepted by NFA's of n states with input alphabet {0, 1} whose deterministic state complexity is 2n. (It is known there are exponentially many.)

Problem 17

Improve the known upper and lower bounds on the number of distinct languages accepted by unary NFA's of n states. It is known there are constants c, d such that there are at least 2n + c √n/log n languages and at most (dn/log n)n. See here for the upper bound and here for the lower bound.

Problem 18

Prove or disprove: if an infinite word w has the property that there exists an integer N such that every finite prefix of w can be written as the concatenation of at most N palindromes (counted with repetition), then w is ultimately periodic.

Problem 19

Let Σk = {0, 1, 2, ..., k - 1}, and let L ⊆ (Σk × Σk)* be a language representing pairs of numbers in base k. Is the following problem decidable (i.e., is there an algorithm for it?): given a DFA M accepting such an L, representing a set S of pairs of integers in base k, does there exist a pair (m, n) in S for which m is divisible by n?

If the "does there exist a pair" is replaced by "is it true that for all pairs", then the answer is known to be decidable.

Problem 20

The setup is the same as in the previous problem, except now L is a CFL, generated by a grammar G. Now the problem is, given G, with L(G) representing a set of pairs of integers S, is the quantity sup(m, n) ∈ S m/n computable? If L is accepted by a DFA instead, the answer is known to be "yes".

Problem 21

Let 0 < x < 1 be a real number, and let b, k ≥ 2 be integers. We say x is (b, k)-automatic if there exists a DFA with outputs on the states that, having read the input n expressed in base k, reaches a state with output an, where x = 0. a1 a2 a3 ... in base b.

Is π or e or log 2 or γ (Euler's constant) (b, k)-automatic for some (b, k)? Probably not, but nobody currently knows how to prove it.

Problem 22

The "distinguishing strings" problem -- one of the most fundamental problems in automata theory. Given two words w and x, both of length ≤ n, what is the size of the smallest DFA that accepts one and rejects the other? Robson proved that O(n2/5 (log n)3/5) states suffice, and there are examples known where Ω(log n) states are needed. I offer $100 for a significantly improved bound in either direction. You can read a survey about the problem here.

Problem 23

The Endrullis-Hendriks Thue-Morse transducer problem: Consider the Thue-Morse sequence t = t0 t1 t2 ... = 01101001 ... . We can apply a finite-state transducer to t that outputs the sequence c = c1 c2 c3 ... = 010001010 ... defined by ci = 1, if ti = ti-1 and 0 otherwise. Now we can apply another finite-state transducer to c = c1 c2 c3 ... that deletes every third letter, starting with the third, outputting d := c1 c2 c4 c5 c7 c8 ... = 0100011001 ... . Is there a finite-state transducer that, given d as input, outputs t? Nobody knows how to do this currently, but only a few people have thought about it, and it might be easy.

There is a more general question here, which I mentioned in class: suppose you apply a finite-state transducer to t and you get something, say x, that is not ultimately periodic. Can you always apply a finite-state transducer to x to get t back again? Nobody knows.

Problem 24

Consider the following decision problem: given an NFA M of n states, does there an integer r such that ΣrL(M)?

What is its complexity? I can prove it is PSPACE-hard, but is it in PSPACE? I offer $100 for this problem.

Update, 2017: Solved! By Marek Szykula and Pawel Gawrychowski.

Problem 25

What are the minimal elements for the positive squares, expressed in base 7? From Section 3.12 of the course text, we know that the set is finite. The first few terms are 1,4,22,562,533302,600552,6306532,333653302,3063666352,5335553532,33666603502,60063060352,300605550352,3336066003502,35050033500502,50500553555302,306036333363352,306335355500002,333365503550032,335000000055532,3030606663350302,3030666003005032,3033330063055552, 6303363360363502,... that is, the base-7 representation of the squares of the integers (in base 10) 1,2,4,17,304,318,871,4498,11246,14907,31762,41233,77248,220343,601073,703111,1455619,1457642,1541012,1547431,3813856,3814073,3818594,5528954,... What I want is a complete description of them.

Same question for bases 9 and 10. For base 10 this is sequence A130448 in the Online Encyclopedia of Integer Sequences.

Problem 26

Let P be the language of primes expressed in base 2. It is known and not too hard to prove that P is not regular. How about P*? Is it regular? Context-free? Probably not either, but nobody seems to know how to prove it currently.

Problem 27

Let x be a string of length 2n, with n zeroes and n ones. The Lyngsø-Pedersen conjecture states that every such x has a conjugate (cyclic shift) possessing a (scattered) subsequence y that is an antipalindrome, and |y| ≥ 4n/3. Note that this bound is essentially tight, as exhibited by the string 12t-1 0t (10)t-1 0t-1. (The correct tight bound seems to be 2ceil((2n+1)/3).)

Problem 28

Consider a "tag system" that acts on strings w as follows: if w begins with 0, first replace w with w00. If w begins with 1, first replace w with w1101. In both cases, next delete the first 3 symbols of w. Is the following problem algorithmically solvable: given an input w, does the orbit of w under repeated applications of this operation consist of infinitely many distict strings? See this paper for more info about the problem.

Problem 29

Consider the morphism μ defined by μ(0) = 01 and μ(1) = 10. What is the length of a longest common subsequence between the strings μn(0) and μn(1) ? I am looking for a formula or simple-to-calculate description. This is an old problem of Jean Berstel.

Update: Partially solved in Fall 2018 by Joakim Blikstad; he proved that this LCS has length 2n(1-o(1)).

Problem 30

How many binary strings of length 2n can be obtained by taking any binary string of length n, say x, and shuffling it with itself? (Not "perfect shuffle", but shuffle as described in Example 3.3.8 of the course text.) Develop an efficient algorithm to compute this (one that, for example, would allow you to compute this number for n = 100). Or give good upper and lower bounds on the number of such strings. See sequence A191755 in the OEIS.

Problem 31

What is the worst-case state complexity of the shuffle operation (not perfect shuffle)? Some results are known already -- see this arxiv paper or the published version. Here the problem is to find the exact complexity, for all pairs of languages with m and n states, and all alphabet sizes k.