![]() Finally, the number s r,n of such possible substrings is : To build such an "incomplete" substring, you first have to choose a subset of r characters of your input string and then a permutation of such characters. Your output string can there be decomposed in p "complete" substrings made out of a permutation of your n input characters and one substring made out of only r characters of your input string. Then, when k > n, things are getting much worse. I understood that you are trying to build a string of k characters extracted from your input string.Ī combination of your string characters can be seen as a permutation of ⟦ 1, n⟧. I will assume all the characters in your string are distinct. ![]() Suppose you have a string of length n ∈ ℕ * and k ∈ ℕ, k ⩾ n. I will denote the binomial coefficient defining the number of parts containing k elements of a set containing n elements by C(k,n). My answer will only be a theoretical analysis of what you are doing. ![]() The point is: count the actual substrings of the form you want, because there are fewer of them than there are hypothetical substrings of the correct form. If I have misunderstood what you mean by "combinations", that is to say if my uses_only function is incorrect, then you'd have to adjust my idea accordingly. Return collections.Counter(generate_substrings(vlarge, chars, k)).most_common(1)Ĭompared with this approach, the "obvious" idea (iterate over all combinations, counting how many times they appear as a substring, and keeping track of the best so far) uses less memory but will be a lot slower, since it has to make a lot of passes over the very large string. So you can skip ahead to the substring that starts one place after the x: def generate_substrings(vlarge, chars, k): You can then look at making this basic idea more efficient: for example if you encounter an 'x' character in vlarge then you know that none of the substrings that include it are combinations of 'abcd'. ![]() Return collections.Counter(s for s in substrings(vlarge, k) if uses_only(s, chars)).most_common(1) Return (vlarge for idx in range(len(vlarge)-k+1)) There's another way to do what you want: def substrings(vlarge, k): I'm trying to search a very large string for the number of occurrences of each combination and see which combination occurs the most often. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |