Algorytm przekształcania jednego słowa w drugie za pomocą poprawnych słów

Question

Algorytm przekształcania jednego słowa w drugie za pomocą poprawnych słów

Natknąłem się na tę odmianę edit-distance problem:

Zaprojektuj algorytm, który przekształca słowo źródłowe w słowo docelowe. na przykład: od głowy do ogona, w każdym kroku, po prostu można zastąpić jeden znak, a słowo musi być ważne. Dostaniesz słownik.

Jest to oczywiście odmiana problemuedit distance , ale w edit distance nie obchodzi mnie, czy słowo jest poprawne, czy nie. Jak więc dodać ten wymóg, aby edytować odległość.

37

algorithm string transform

Author: Community, 2010-02-05

Source

8 answers

Podejście do grafu Codaddict jest poprawne, chociaż zbudowanie każdego grafu zajmuje O (N^2) czasu, gdzie n jest liczbą słów o danej długości. Jeśli jest to problem, możesz zbudować BK-tree znacznie efektywniej, dzięki czemu można znaleźć wszystkie słowa o określonej odległości edycji (w tym przypadku, 1) słowa docelowego.

10

Author: Nick Johnson,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2010-02-08 10:38:38

To chyba nie jest edit distance.

Myślę, że można to zrobić za pomocą wykresu. Wystarczy skonstruować wykres ze słownika i spróbować poruszać się za pomocą ulubionego algorytmu przechodzenia grafu do miejsca docelowego.

2

Author: Yuliy,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2010-02-05 07:05:30

Utwórz Wykres z każdym węzłem reprezentującym słowo w słowniku. Dodaj krawędź między dwoma węzłami słowa, Jeśli odpowiadające im słowa znajdują się w odległości edycji 1. Wtedy minimalna liczba wymaganych przekształceń oznaczałaby długość najkrótszej ścieżki między węzłem źródłowym a węzłem docelowym.

2

Author: prasadvk,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2013-04-16 00:30:34

Można po prostu użyć rekurencyjnego back-trackingu, ale nie jest to najbardziej optymalne rozwiązanie.

# Given two words of equal length that are in a dictionary, write a method to transform one word into another word by changing only
# one letter at a time.  The new word you get in each step must be in the
# dictionary.

# def transform(english_words, start, end):

# transform(english_words, 'damp', 'like')
# ['damp', 'lamp', 'limp', 'lime', 'like']
# ['damp', 'camp', 'came', 'lame', 'lime', 'like']


def is_diff_one(str1, str2):
    if len(str1) != len(str2):
        return False

    count = 0
    for i in range(0, len(str1)):
        if str1[i] != str2[i]:
            count = count + 1

    if count == 1:
        return True

    return False


potential_ans = []


def transform(english_words, start, end, count):
    global potential_ans
    if count == 0:
        count = count + 1
        potential_ans = [start]

    if start == end:
        print potential_ans
        return potential_ans

    for w in english_words:
        if is_diff_one(w, start) and w not in potential_ans:
            potential_ans.append(w)
            transform(english_words, w, end, count)
            potential_ans[:-1]

    return None


english_words = set(['damp', 'camp', 'came', 'lame', 'lime', 'like'])
transform(english_words, 'damp', 'lame', 0)

1

Author: Pradeep Vairamani,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2015-10-16 21:12:21

Jest to kod C# do rozwiązania problemu przy użyciu BFS:

//use a hash set for a fast check if a word is already in the dictionary
    static HashSet<string> Dictionary = new HashSet<string>();
    //dictionary used to find the parent in every node in the graph and to avoid traversing an already traversed node
    static Dictionary<string, string> parents = new Dictionary<string, string>();

    public static List<string> FindPath(List<string> input, string start, string end)
    {
        char[] allcharacters = {'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'};

        foreach (string s in input)
            Dictionary.Add(s);
        List<string> currentFrontier = new List<string>();
        List<string> nextFrontier = new List<string>();
        currentFrontier.Add(start);
        while (currentFrontier.Count > 0)
        {
            foreach (string s in currentFrontier)
            {
                for (int i = 0; i < s.Length; i++)
                {
                    foreach (char c in allcharacters)
                    {
                        StringBuilder newWordBuilder = new StringBuilder(s);
                        newWordBuilder[i] = c;
                        string newWord = newWordBuilder.ToString();
                        if (Dictionary.Contains(newWord))
                        {
                            //avoid traversing a previously traversed node
                            if (!parents.Keys.Contains(newWord))
                            {
                                parents.Add(newWord.ToString(), s);
                                nextFrontier.Add(newWord);
                            }

                        }
                        if (newWord.ToString() == end)
                        {
                            return ExtractPath(start, end);

                        }
                    }
                }
            }
            currentFrontier.Clear();
            currentFrontier.Concat(nextFrontier);
            nextFrontier.Clear();
        }
        throw new ArgumentException("The given dictionary cannot be used to get a path from start to end");
    }

    private static List<string> ExtractPath(string start,string end)
    {
        List<string> path = new List<string>();
        string current = end;
        path.Add(end);
        while (current != start)
        {
            current = parents[current];
            path.Add(current);
        }
         path.Reverse();
         return path;
    }

0

Author: Muhammad Adel,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2014-04-10 20:18:09

Nie potrzebujemy wykresu ani innej skomplikowanej struktury danych. Moim pomysłem jest załadowanie słownika jako HashSet i użycie metody contains(), aby dowiedzieć się, czy słowo istnieje w słowniku, czy nie.

Proszę sprawdzić ten pseudokod {[7] } aby zobaczyć mój pomysł:

Two words are given: START and STOP. 
//List is our "way" from words START to STOP, so, we add the original word to it first.
    list.add(START);
//Finish to change the word when START equals STOP.
    while(!START.equals(STOP))
//Change each letter at START to the letter to STOP one by one and check if such word exists.
    for (int i = 0, i<STOP.length, i++){
        char temp = START[i];
        START[i] = STOP[i];
//If the word exists add a new word to the list of results. 
//And change another letter in the new word with the next pass of the loop.
        if dictionary.contains(START)
           list.add(START)
//If the word doesn't exist, leave it like it was and try to change another letter with the next pass of the loop.
        else START[i] = temp;}
    return list;

Jak rozumiem mój kod powinien tak działać:

Input :]}

Wyjście : wilgotne, lampowe, wiotkie, wapienne, podobne

Input : BACK, PICK

Wyjście : BACK, PACK, PICK

0

Author: Boris,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2017-12-05 08:02:26

Jest to oczywiście problem permutacji. Używanie Wykresu to przesada. Problemu brakuje jednego ważnego ograniczenia; że każdą pozycję można zmienić tylko raz. To sprawia, że dorozumiane jest, że rozwiązanie jest w 4 krokach. Teraz wszystko, co wymaga decyzji, to sekwencja operacji zamiany:

Operacja1 = Zmień "H " na"T"
Operacja2 = Zmień "E " na"A"
Operation3 = Zmień "A " na"I"
Operacja4 = Zmień " D na "L"

Rozwiązaniem, sekwencją operacji, jest pewna permutacja ciągu "1234", gdzie każda cyfra reprezentuje pozycję zastępowanego znaku. np. "3124" wskazuje, że najpierw zastosujemy operację3, następnie operację1, następnie operację2, a następnie operację 4. Na każdym kroku, jeśli wynikowy wyraz nie znajduje się w słowniku, przejdź do następnej permutacji. Dość trywialne. Ktoś koduje?

-2

Author: ChampCoda,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2012-02-14 19:58:18

score 39 · Accepted Answer

Można to modelować jako problem grafu. Słowa można traktować jako węzły wykresu, a dwa węzły są połączone wtedy i tylko wtedy, gdy są tej samej długości i różnią się jednym znakiem.

Możesz wstępnie przetworzyć słownik i utworzyć ten wykres, powinien wyglądać następująco:

   stack  jack
    |      |
    |      |
   smack  back -- pack -- pick

Możesz następnie mapować ze słowa do węzła reprezentującego słowo, do tego możesz użyć tabeli hash, wysokość balanced BST ...

Gdy już masz powyższe mapowanie, wszystko co musisz sprawdź, czy istnieje ścieżka między dwoma węzłami wykresu, co można łatwo zrobić za pomocą BFS lub DFS.

Więc można podsumować algorytm jako:

preprocess the dictionary and create the graph.
Given the two inputs words w1 and w2
if length(w1) != length(w2)
 Not possible to convert
else
 n1 = get_node(w1)
 n2 = get_node(w2)

 if(path_exists(n1,n2))
   Possible and nodes in the path represent intermediary words
 else
   Not possible