Jak parsować łańcuchy, aby wyglądały jak sys.argv

Question

Jak parsować łańcuchy, aby wyglądały jak sys.argv

Chciałbym przetworzyć taki ciąg znaków:

-o 1  --long "Some long string"

Do tego:

["-o", "1", "--long", 'Some long string']

Lub podobne.

To jest INNE niż getopt, lub optparse, które zaczynają z sys.argv parsed input (jak wyjście, które mam powyżej). Czy jest na to standardowy sposób? Zasadniczo jest to "dzielenie" przy jednoczesnym zachowaniu cytowanych ciągów.

Moja najlepsza funkcja do tej pory:

import csv
def split_quote(string,quotechar='"'):
    '''

    >>> split_quote('--blah "Some argument" here')
    ['--blah', 'Some argument', 'here']

    >>> split_quote("--blah 'Some argument' here", quotechar="'")
    ['--blah', 'Some argument', 'here']
    '''
    s = csv.StringIO(string)
    C = csv.reader(s, delimiter=" ",quotechar=quotechar)
    return list(C)[0]

45

python parsing argv

Author: Georgy, 2009-05-22

Source

2 answers

Zanim zdałem sobie sprawę z shlex.split, zrobiłem:

import sys

_WORD_DIVIDERS = set((' ', '\t', '\r', '\n'))

_QUOTE_CHARS_DICT = {
    '\\':   '\\',
    ' ':    ' ',
    '"':    '"',
    'r':    '\r',
    'n':    '\n',
    't':    '\t',
}

def _raise_type_error():
    raise TypeError("Bytes must be decoded to Unicode first")

def parse_to_argv_gen(instring):
    is_in_quotes = False
    instring_iter = iter(instring)
    join_string = instring[0:0]

    c_list = []
    c = ' '
    while True:
        # Skip whitespace
        try:
            while True:
                if not isinstance(c, str) and sys.version_info[0] >= 3:
                    _raise_type_error()
                if c not in _WORD_DIVIDERS:
                    break
                c = next(instring_iter)
        except StopIteration:
            break
        # Read word
        try:
            while True:
                if not isinstance(c, str) and sys.version_info[0] >= 3:
                    _raise_type_error()
                if not is_in_quotes and c in _WORD_DIVIDERS:
                    break
                if c == '"':
                    is_in_quotes = not is_in_quotes
                    c = None
                elif c == '\\':
                    c = next(instring_iter)
                    c = _QUOTE_CHARS_DICT.get(c)
                if c is not None:
                    c_list.append(c)
                c = next(instring_iter)
            yield join_string.join(c_list)
            c_list = []
        except StopIteration:
            yield join_string.join(c_list)
            break

def parse_to_argv(instring):
    return list(parse_to_argv_gen(instring))

To działa z Pythonem 2.x i 3.x. na Pythonie 2.X, działa bezpośrednio z ciągami bajtowymi i ciągami Unicode. Na Pythonie 3.x, to tylko akceptuje łańcuchy [Unicode], a nie Obiekty bytes.

To nie zachowuje się dokładnie tak samo jak Shell argv splitting-umożliwia również cytowanie znaków CR, LF i TAB jako \r, \n i \t, konwersja ich na prawdziwe CR, LF, TAB (shlex.split tego nie robi). Więc pisanie własnej funkcji było przydatne dla moich potrzeb. Myślę, że shlex.split jest lepsze, jeśli chcesz po prostu zwykły podział argv w stylu powłoki. Dzielę się tym kodem na wypadek, gdyby był przydatny jako podstawa do robienia czegoś nieco innego.

3

Author: Craig McQueen,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2013-05-15 03:56:56

score 88 · Accepted Answer

Wierzę, że chcesz shlex moduł.

>>> import shlex
>>> shlex.split('-o 1 --long "Some long string"')
['-o', '1', '--long', 'Some long string']

88

Author: Jacob Gabrielson,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2009-05-22 18:33:07