C. Genetic engineering

Codeforces
IDCF86C
Time2000ms
Memory256MB
Difficulty
dpstring suffix structurestrees
English · Original
Chinese · Translation
Formal · Original
"Multidimensional spaces are completely out of style these days, unlike genetics problems" — thought physicist Woll and changed his subject of study to bioinformatics. Analysing results of sequencing he faced the following problem concerning DNA sequences. We will further think of a DNA sequence as an arbitrary string of uppercase letters "_A_", "_C_", "_G_" and "_T_" (of course, this is a simplified interpretation). Let _w_ be a long DNA sequence and _s_1, _s_2, ..., _s__m_ — collection of short DNA sequences. Let us say that the collection _filters_ _w_ iff _w_ can be covered with the sequences from the collection. Certainly, substrings corresponding to the different positions of the string may intersect or even cover each other. More formally: denote by |_w_| the length of _w_, let symbols of _w_ be numbered from 1 to |_w_|. Then for each position _i_ in _w_ there exist pair of indices _l_, _r_ (1 ≤ _l_ ≤ _i_ ≤ _r_ ≤ |_w_|) such that the substring _w_\[_l_ ... _r_\] equals one of the elements _s_1, _s_2, ..., _s__m_ of the collection. Woll wants to calculate the number of DNA sequences of a given length filtered by a given collection, but he doesn't know how to deal with it. Help him! Your task is to find the number of different DNA sequences of length _n_ filtered by the collection {_s__i_}. Answer may appear very large, so output it modulo 1000000009. ## Input First line contains two integer numbers _n_ and _m_ (1 ≤ _n_ ≤ 1000, 1 ≤ _m_ ≤ 10) — the length of the string and the number of sequences in the collection correspondently. Next _m_ lines contain the collection sequences _s__i_, one per line. Each _s__i_ is a nonempty string of length not greater than 10. All the strings consist of uppercase letters "_A_", "_C_", "_G_", "_T_". The collection may contain identical strings. ## Output Output should contain a single integer — the number of strings filtered by the collection modulo 1000000009 (109 + 9). [samples] ## Note In the first sample, a string has to be filtered by "_A_". Clearly, there is only one such string: "_AA_". In the second sample, there exist exactly two different strings satisfying the condition (see the pictures below). <center>![image](https://espresso.codeforces.com/0997f9113feb12725b64c8e7a90c4bc157fc38b3.png)</center><center>![image](https://espresso.codeforces.com/13eebf02536f5a540ecde18b1eb2d4526be3ec50.png)</center>
"多维空间如今已完全过时,不像遗传学问题那样热门"——物理学家Woll如此思考后,转而研究生物信息学。在分析测序结果时,他遇到了一个关于DNA序列的问题。我们将DNA序列视为由大写字母"_A_"、"_C_"、"_G_"和"_T_"组成的任意字符串(当然,这是简化后的解释)。 设 #cf_span[w] 为一个长DNA序列,#cf_span[s1, s2, ..., sm] 为一组短DNA序列。若 #cf_span[w] 可被该集合中的序列覆盖,则称该集合 _过滤_ #cf_span[w]。显然,对应字符串不同位置的子串可以相交甚至相互覆盖。更形式化地:记 #cf_span[|w|] 为 #cf_span[w] 的长度,将 #cf_span[w] 的字符编号从 #cf_span[1] 到 #cf_span[|w|]。那么对于 #cf_span[w] 中的每个位置 #cf_span[i],都存在一对索引 #cf_span[l, r](满足 #cf_span[1 ≤ l ≤ i ≤ r ≤ |w|]),使得子串 #cf_span[w[l ... r]] 等于集合 #cf_span[s1, s2, ..., sm] 中的某个元素。 Woll希望计算给定长度被给定集合过滤的DNA序列的数量,但他不知道如何处理。请帮助他!你的任务是找出被集合 #cf_span[{si}] 过滤的长度为 #cf_span[n] 的不同DNA序列的数量。 答案可能非常大,因此请对 #cf_span[1000000009] 取模输出。 第一行包含两个整数 #cf_span[n] 和 #cf_span[m](#cf_span[1 ≤ n ≤ 1000, 1 ≤ m ≤ 10]),分别表示字符串长度和集合中序列的数量。 接下来 #cf_span[m] 行,每行包含一个集合中的序列 #cf_span[si]。每个 #cf_span[si] 是一个非空字符串,长度不超过 #cf_span[10]。所有字符串均由大写字母 "_A_"、"_C_"、"_G_"、"_T_" 组成。集合中可能包含相同的字符串。 输出应包含一个整数——被该集合过滤的字符串数量对 #cf_span[1000000009](即 #cf_span[109 + 9])取模的结果。 在第一个样例中,字符串必须被 "_A_" 过滤。显然,仅存在一个这样的字符串:"_AA_"。 在第二个样例中,恰好存在两个满足条件的不同字符串(见下图)。 ## Input 第一行包含两个整数 #cf_span[n] 和 #cf_span[m](#cf_span[1 ≤ n ≤ 1000, 1 ≤ m ≤ 10]),分别表示字符串长度和集合中序列的数量。接下来 #cf_span[m] 行,每行包含一个集合中的序列 #cf_span[si]。每个 #cf_span[si] 是一个非空字符串,长度不超过 #cf_span[10]。所有字符串均由大写字母 "_A_"、"_C_"、"_G_"、"_T_" 组成。集合中可能包含相同的字符串。 ## Output 输出应包含一个整数——被该集合过滤的字符串数量对 #cf_span[1000000009](即 #cf_span[109 + 9])取模的结果。 [samples] ## Note 在第一个样例中,字符串必须被 "_A_" 过滤。显然,仅存在一个这样的字符串:"_AA_"。 在第二个样例中,恰好存在两个满足条件的不同字符串(见下图)。
**Definitions** Let $ n \in \mathbb{Z}^+ $ be the length of the target DNA sequence. Let $ \mathcal{S} = \{s_1, s_2, \dots, s_m\} $ be a collection of $ m \in \mathbb{Z}^+ $ nonempty pattern strings over the alphabet $ \Sigma = \{A, C, G, T\} $, each of length at most 10. A string $ w \in \Sigma^n $ is *filtered* by $ \mathcal{S} $ if for every position $ i \in \{1, 2, \dots, n\} $, there exists a substring $ w[l:r] $ (with $ 1 \leq l \leq i \leq r \leq n $) such that $ w[l:r] \in \mathcal{S} $. **Constraints** 1. $ 1 \leq n \leq 1000 $ 2. $ 1 \leq m \leq 10 $ 3. For each $ s_j \in \mathcal{S} $, $ 1 \leq |s_j| \leq 10 $ 4. All $ s_j $ consist only of characters from $ \Sigma = \{A, C, G, T\} $ 5. $ \mathcal{S} $ may contain duplicate strings (but treated as a set for coverage condition) **Objective** Compute the number of strings $ w \in \Sigma^n $ that are filtered by $ \mathcal{S} $, modulo $ 1000000009 $.
Samples
Input #1
2 1
A
Output #1
1
Input #2
6 2
CAT
TACT
Output #2
2
API Response (JSON)
{
  "problem": {
    "name": "C. Genetic engineering",
    "description": {
      "content": "\"Multidimensional spaces are completely out of style these days, unlike genetics problems\" — thought physicist Woll and changed his subject of study to bioinformatics. Analysing results of sequencing ",
      "description_type": "Markdown"
    },
    "platform": "Codeforces",
    "limit": {
      "time_limit": 2000,
      "memory_limit": 262144
    },
    "difficulty": "None",
    "is_remote": true,
    "is_sync": true,
    "sync_url": null,
    "sign": "CF86C"
  },
  "statements": [
    {
      "statement_type": "Markdown",
      "content": "\"Multidimensional spaces are completely out of style these days, unlike genetics problems\" — thought physicist Woll and changed his subject of study to bioinformatics. Analysing results of sequencing ...",
      "is_translate": false,
      "language": "English"
    },
    {
      "statement_type": "Markdown",
      "content": "\"多维空间如今已完全过时,不像遗传学问题那样热门\"——物理学家Woll如此思考后,转而研究生物信息学。在分析测序结果时,他遇到了一个关于DNA序列的问题。我们将DNA序列视为由大写字母\"_A_\"、\"_C_\"、\"_G_\"和\"_T_\"组成的任意字符串(当然,这是简化后的解释)。\n\n设 #cf_span[w] 为一个长DNA序列,#cf_span[s1, s2, ..., sm] 为一组短DNA序列。若...",
      "is_translate": true,
      "language": "Chinese"
    },
    {
      "statement_type": "Markdown",
      "content": "**Definitions**  \nLet $ n \\in \\mathbb{Z}^+ $ be the length of the target DNA sequence.  \nLet $ \\mathcal{S} = \\{s_1, s_2, \\dots, s_m\\} $ be a collection of $ m \\in \\mathbb{Z}^+ $ nonempty pattern strin...",
      "is_translate": false,
      "language": "Formal"
    }
  ]
}
Full JSON Raw Segments