Czy istnieje wbudowana funkcja dla strunowego sortowania naturalnego?

Question

Czy istnieje wbudowana funkcja dla strunowego sortowania naturalnego?

Mam listę ciągów, dla których chciałbym wykonać naturalny sortowanie alfabetyczne .

Na przykład poniższa lista jest naturalnie posortowana (co chcę):

['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

A oto "posortowana" wersja powyższej listy (co dostaję za pomocą sorted()):

['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']

Szukam funkcji sort, która zachowuje się jak pierwsza.

313

python sorting

Author: Boris, 2011-01-29

Source

19 answers

Spróbuj tego:

import re

def natural_sort(l): 
    convert = lambda text: int(text) if text.isdigit() else text.lower() 
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
    return sorted(l, key = alphanum_key)

Wyjście:

['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

Kod zaadaptowany stąd: sortowanie dla ludzi: naturalny porządek sortowania .

202

Author: Mark Byers,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2019-08-06 14:54:32

Oto znacznie bardziej pythoniczna Wersja odpowiedzi Marka Byera:

import re

def natural_sort_key(s, _nsre=re.compile('([0-9]+)')):
    return [int(text) if text.isdigit() else text.lower()
            for text in _nsre.split(s)]

Teraz ta funkcja może być używana jako klucz w każdej funkcji, która jej używa, jak list.sort, sorted, max, itd.

Jako lambda:

lambda s: [int(t) if t.isdigit() else t.lower() for t in re.split('(\d+)', s)]

108

Author: Claudiu,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2018-10-05 12:42:48

data = ['elm13', 'elm9', 'elm0', 'elm1', 'Elm11', 'Elm2', 'elm10']

Przeanalizujmy dane. Pojemność cyfr wszystkich elementów wynosi 2. I są 3 litery w wspólnej części literalnej 'elm'.

Więc maksymalna długość elementu wynosi 5. Możemy zwiększyć tę wartość, aby upewnić się (na przykład do 8).

Mając to na uwadze, mamy rozwiązanie jednoliniowe:

data.sort(key=lambda x: '{0:0>8}'.format(x).lower())

Bez wyrażeń regularnych i zewnętrznych bibliotek!

print(data)

>>> ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'elm13']

Wyjaśnienie:

for elm in data:
    print('{0:0>8}'.format(elm).lower())

>>>
0000elm0
0000elm1
0000elm2
0000elm9
000elm10
000elm11
000elm13

21

Author: SergO,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2016-04-19 11:56:58

Napisałem funkcję opartą na http://www.codinghorror.com/blog/2007/12/sorting-for-humans-natural-sort-order.html, który dodaje możliwość nadal przejść w parametrze 'klucz'. Potrzebuję tego, aby wykonać naturalny rodzaj list, które zawierają bardziej złożone obiekty(nie tylko łańcuchy).

import re

def natural_sort(list, key=lambda s:s):
    """
    Sort the list into natural alphanumeric order.
    """
    def get_alphanum_key_func(key):
        convert = lambda text: int(text) if text.isdigit() else text 
        return lambda s: [convert(c) for c in re.split('([0-9]+)', key(s))]
    sort_key = get_alphanum_key_func(key)
    list.sort(key=sort_key)

Na przykład:

my_list = [{'name':'b'}, {'name':'10'}, {'name':'a'}, {'name':'1'}, {'name':'9'}]
natural_sort(my_list, key=lambda x: x['name'])
print my_list
[{'name': '1'}, {'name': '9'}, {'name': '10'}, {'name': 'a'}, {'name': 'b'}]

19

Author: beauburrier,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2012-01-20 10:53:53

Podane:

data=['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']

Podobnie jak rozwiązanie SergO, 1-liner bez zewnętrznych bibliotek byłby :

data.sort(key=lambda x : int(x[3:]))

Lub

sorted_data=sorted(data, key=lambda x : int(x[3:]))

Wyjaśnienie:

To rozwiązanie wykorzystuje funkcję klucz z sort do zdefiniowania funkcji, która będzie używana do sortowania. Ponieważ wiemy, że każdy wpis danych jest poprzedzony 'elm', funkcja sortująca zamienia na liczbę całkowitą część łańcucha po trzecim znaku (tzn. int (x[3:])). Jeśli liczba część danych znajduje się w innym miejscu, wtedy ta część funkcji musiałaby się zmienić.

Cheers

16

Author: Camilo,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2017-03-14 15:16:56

Wartość Tego Postu

Moim celem jest zaoferowanie rozwiązania non regex, które można zastosować ogólnie.
Stworzę trzy funkcje:

find_first_digit które pożyczyłem od @AnuragUniyal. Znajdzie pozycję pierwszej cyfry lub niecyfrowej w łańcuchu.
split_digits który jest generatorem, który rozdziela ciąg na kawałki cyfr i niecyfrów. Będzie również yield liczbami całkowitymi, gdy jest cyfrą.
natural_key po prostu zawija {[7] } w tuple. To jest to, czego używamy jako klucza do sorted, max, min.

Funkcje

def find_first_digit(s, non=False):
    for i, x in enumerate(s):
        if x.isdigit() ^ non:
            return i
    return -1

def split_digits(s, case=False):
    non = True
    while s:
        i = find_first_digit(s, non)
        if i == 0:
            non = not non
        elif i == -1:
            yield int(s) if s.isdigit() else s if case else s.lower()
            s = ''
        else:
            x, s = s[:i], s[i:]
            yield int(x) if x.isdigit() else x if case else x.lower()

def natural_key(s, *args, **kwargs):
    return tuple(split_digits(s, *args, **kwargs))

Widzimy, że jest to ogólne w tym, że możemy mieć wiele kawałków cyfr:

# Note that the key has lower case letters
natural_key('asl;dkfDFKJ:sdlkfjdf809lkasdjfa_543_hh')

('asl;dkfdfkj:sdlkfjdf', 809, 'lkasdjfa_', 543, '_hh')

Lub pozostawić jako rozróżnienie wielkości liter:

natural_key('asl;dkfDFKJ:sdlkfjdf809lkasdjfa_543_hh', True)

('asl;dkfDFKJ:sdlkfjdf', 809, 'lkasdjfa_', 543, '_hh')

Widzimy, że sortuje listę OP w odpowiedniej kolejności

sorted(
    ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13'],
    key=natural_key
)

['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

Ale może obsługiwać również bardziej skomplikowane listy:

sorted(
    ['f_1', 'e_1', 'a_2', 'g_0', 'd_0_12:2', 'd_0_1_:2'],
    key=natural_key
)

['a_2', 'd_0_1_:2', 'd_0_12:2', 'e_1', 'f_1', 'g_0']

Moim odpowiednikiem regex będzie

def int_maybe(x):
    return int(x) if str(x).isdigit() else x

def split_digits_re(s, case=False):
    parts = re.findall('\d+|\D+', s)
    if not case:
        return map(int_maybe, (x.lower() for x in parts))
    else:
        return map(int_maybe, parts)
    
def natural_key_re(s, *args, **kwargs):
    return tuple(split_digits_re(s, *args, **kwargs))

7

Author: piRSquared,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2020-06-20 09:12:55

a teraz coś więcej* elegant (pythonic) -just a touch

Istnieje wiele implementacji i chociaż niektóre są blisko, żadna nie uchwyciła elegancji, jaką zapewnia współczesny python.

testowane przy użyciu Pythona (3.5.1)
dołączono dodatkową listę, aby wykazać, że działa, gdy liczby to średni ciąg
nie testowałem, jednak zakładam, że jeśli Twoja lista byłaby spora, bardziej wydajne byłoby skompilowanie regex wcześniej
- jestem pewien, że ktoś mnie poprawi, jeśli jest to błędne założenie

Quicky

from re import compile, split    
dre = compile(r'(\d+)')
mylist.sort(key=lambda l: [int(s) if s.isdigit() else s.lower() for s in split(dre, l)])

Pełny Kod

#!/usr/bin/python3
# coding=utf-8
"""
Natural-Sort Test
"""

from re import compile, split

dre = compile(r'(\d+)')
mylist = ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13', 'elm']
mylist2 = ['e0lm', 'e1lm', 'E2lm', 'e9lm', 'e10lm', 'E12lm', 'e13lm', 'elm', 'e01lm']

mylist.sort(key=lambda l: [int(s) if s.isdigit() else s.lower() for s in split(dre, l)])
mylist2.sort(key=lambda l: [int(s) if s.isdigit() else s.lower() for s in split(dre, l)])

print(mylist)  
  # ['elm', 'elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
print(mylist2)  
  # ['e0lm', 'e1lm', 'e01lm', 'E2lm', 'e9lm', 'e10lm', 'E12lm', 'e13lm', 'elm']

Uwaga podczas stosowania

from os.path import split
- będziesz musiał różnicować import

Inspiracja z

dokumentacja Pythona-sortowanie jak
sortowanie dla ludzi: sortowanie Naturalne Order
Sortowanie Ludzi
autorzy / komentatorzy do tego i odsyłacze do postów

6

Author: JerodG,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2016-05-04 19:30:47

Jedną z opcji jest przekształcenie ciągu znaków w krotkę i zastąpienie cyfr za pomocą rozwiniętego formularza http://wiki.answers.com/Q/What_does_expanded_form_mean

W ten sposób a90 stałoby się ("a",90,0), A a1 stałoby się ("a",1)

Poniżej znajduje się przykładowy kod (który nie jest zbyt wydajny ze względu na sposób, w jaki usuwa początkowe 0 z liczb)

alist=["something1",
    "something12",
    "something17",
    "something2",
    "something25and_then_33",
    "something25and_then_34",
    "something29",
    "beta1.1",
    "beta2.3.0",
    "beta2.33.1",
    "a001",
    "a2",
    "z002",
    "z1"]

def key(k):
    nums=set(list("0123456789"))
        chars=set(list(k))
    chars=chars-nums
    for i in range(len(k)):
        for c in chars:
            k=k.replace(c+"0",c)
    l=list(k)
    base=10
    j=0
    for i in range(len(l)-1,-1,-1):
        try:
            l[i]=int(l[i])*base**j
            j+=1
        except:
            j=0
    l=tuple(l)
    print l
    return l

print sorted(alist,key=key)

Wyjście:

('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 1)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 10, 2)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 10, 7)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 2)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 20, 5, 'a', 'n', 'd', '_', 't', 'h', 'e', 'n', '_', 30, 3)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 20, 5, 'a', 'n', 'd', '_', 't', 'h', 'e', 'n', '_', 30, 4)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 20, 9)
('b', 'e', 't', 'a', 1, '.', 1)
('b', 'e', 't', 'a', 2, '.', 3, '.')
('b', 'e', 't', 'a', 2, '.', 30, 3, '.', 1)
('a', 1)
('a', 2)
('z', 2)
('z', 1)
['a001', 'a2', 'beta1.1', 'beta2.3.0', 'beta2.33.1', 'something1', 'something2', 'something12', 'something17', 'something25and_then_33', 'something25and_then_34', 'something29', 'z1', 'z002']

4

Author: robert king,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2016-10-25 15:39:40

Na podstawie odpowiedzi tutaj napisałem natural_sorted funkcję, która zachowuje się jak wbudowana funkcja sorted:

# Copyright (C) 2018, Benjamin Drung <[email protected]>
#
# Permission to use, copy, modify, and/or distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
# ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
# ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
# OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

import re

def natural_sorted(iterable, key=None, reverse=False):
    """Return a new naturally sorted list from the items in *iterable*.

    The returned list is in natural sort order. The string is ordered
    lexicographically (using the Unicode code point number to order individual
    characters), except that multi-digit numbers are ordered as a single
    character.

    Has two optional arguments which must be specified as keyword arguments.

    *key* specifies a function of one argument that is used to extract a
    comparison key from each list element: ``key=str.lower``.  The default value
    is ``None`` (compare the elements directly).

    *reverse* is a boolean value.  If set to ``True``, then the list elements are
    sorted as if each comparison were reversed.

    The :func:`natural_sorted` function is guaranteed to be stable. A sort is
    stable if it guarantees not to change the relative order of elements that
    compare equal --- this is helpful for sorting in multiple passes (for
    example, sort by department, then by salary grade).
    """
    prog = re.compile(r"(\d+)")

    def alphanum_key(element):
        """Split given key in list of strings and digits"""
        return [int(c) if c.isdigit() else c for c in prog.split(key(element)
                if key else element)]

    return sorted(iterable, key=alphanum_key, reverse=reverse)

Kod źródłowy jest również dostępny w repozytorium GitHub snippets: https://github.com/bdrung/snippets/blob/master/natural_sorted.py

4

Author: Benjamin Drung,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2018-02-08 14:56:28

Powyższe odpowiedzi są dobre dla konkretnego przykładu , który został pokazany, ale brakuje kilku przydatnych przypadków dla bardziej ogólnego pytania rodzaju naturalnego. Właśnie ugryzł mnie jeden z tych przypadków, więc stworzyłem bardziej dokładne rozwiązanie:

def natural_sort_key(string_or_number):
    """
    by Scott S. Lawton <[email protected]> 2014-12-11; public domain and/or CC0 license

    handles cases where simple 'int' approach fails, e.g.
        ['0.501', '0.55'] floating point with different number of significant digits
        [0.01, 0.1, 1]    already numeric so regex and other string functions won't work (and aren't required)
        ['elm1', 'Elm2']  ASCII vs. letters (not case sensitive)
    """

    def try_float(astring):
        try:
            return float(astring)
        except:
            return astring

    if isinstance(string_or_number, basestring):
        string_or_number = string_or_number.lower()

        if len(re.findall('[.]\d', string_or_number)) <= 1:
            # assume a floating point value, e.g. to correctly sort ['0.501', '0.55']
            # '.' for decimal is locale-specific, e.g. correct for the Anglosphere and Asia but not continental Europe
            return [try_float(s) for s in re.split(r'([\d.]+)', string_or_number)]
        else:
            # assume distinct fields, e.g. IP address, phone number with '.', etc.
            # caveat: might want to first split by whitespace
            # TBD: for unicode, replace isdigit with isdecimal
            return [int(s) if s.isdigit() else s for s in re.split(r'(\d+)', string_or_number)]
    else:
        # consider: add code to recurse for lists/tuples and perhaps other iterables
        return string_or_number

Kod testowy i kilka linków (włączonych i wyłączonych ze StackOverflow) są tutaj: http://productarchitect.com/code/better-natural-sort.py

Opinie mile widziane. To nie ma być ostateczne rozwiązanie, tylko krok do przodu.

2

Author: Scott Lawton,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2014-12-11 18:42:56

Najprawdopodobniej functools.cmp_to_key() jest ściśle związana z podstawową implementacją Pythona sort. Poza tym parametr CMP jest dziedziczony. Nowoczesnym sposobem jest przekształcenie elementów wejściowych w obiekty, które obsługują pożądane bogate operacje porównywania.

Pod CPython 2.x, obiekty różnych typów mogą być uporządkowane nawet jeśli odpowiednie operatory porównania rich nie zostały zaimplementowane. Pod CPython 3.x, obiekty różnych typów muszą jawnie wspierać porównanie. Zobacz też Jak Python porównuje string i int? który łączy się z oficjalną dokumentacją. Większość odpowiedzi zależy od tego ukrytego zamówienia. Przełączam na Python 3.x będzie wymagał nowego typu do zaimplementowania i ujednolicenia porównań między liczbami i łańcuchami.

Python 2.7.12 (default, Sep 29 2016, 13:30:34) 
>>> (0,"foo") < ("foo",0)
True

Python 3.5.2 (default, Oct 14 2016, 12:54:53) 
>>> (0,"foo") < ("foo",0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  TypeError: unorderable types: int() < str()

Istnieją trzy różne podejścia. Pierwszy używa klas zagnieżdżonych, aby skorzystać z algorytmu porównawczego Iterable Pythona. Drugi rozwija to zagnieżdżanie w jedną klasę. Trzeci foreges subclassing str, aby skupić się na wydajności. Wszystkie są czasowe; drugi jest dwa razy szybszy, podczas gdy trzeci prawie sześć razy szybszy. Podklasowanie str nie jest wymagane i prawdopodobnie było złym pomysłem, ale ma pewne udogodnienia.

Znaki sortowania są zduplikowane, aby wymusić kolejność według wielkości liter, i zamieniane na małe litery, aby najpierw sortować; jest to typowa definicja "sortowania naturalnego". Nie mogłem się zdecydować na rodzaj zgrupowania; niektóre może preferować następujące, co również przynosi znaczące korzyści w wydajności:

d = lambda s: s.lower()+s.swapcase()

Gdzie używane, operatory porównania są ustawione na object, więc nie będą ignorowane przez functools.total_ordering.

import functools
import itertools


@functools.total_ordering
class NaturalStringA(str):
    def __repr__(self):
        return "{}({})".format\
            ( type(self).__name__
            , super().__repr__()
            )
    d = lambda c, s: [ c.NaturalStringPart("".join(v))
                        for k,v in
                       itertools.groupby(s, c.isdigit)
                     ]
    d = classmethod(d)
    @functools.total_ordering
    class NaturalStringPart(str):
        d = lambda s: "".join(c.lower()+c.swapcase() for c in s)
        d = staticmethod(d)
        def __lt__(self, other):
            if not isinstance(self, type(other)):
                return NotImplemented
            try:
                return int(self) < int(other)
            except ValueError:
                if self.isdigit():
                    return True
                elif other.isdigit():
                    return False
                else:
                    return self.d(self) < self.d(other)
        def __eq__(self, other):
            if not isinstance(self, type(other)):
                return NotImplemented
            try:
                return int(self) == int(other)
            except ValueError:
                if self.isdigit() or other.isdigit():
                    return False
                else:
                    return self.d(self) == self.d(other)
        __le__ = object.__le__
        __ne__ = object.__ne__
        __gt__ = object.__gt__
        __ge__ = object.__ge__
    def __lt__(self, other):
        return self.d(self) < self.d(other)
    def __eq__(self, other):
        return self.d(self) == self.d(other)
    __le__ = object.__le__
    __ne__ = object.__ne__
    __gt__ = object.__gt__
    __ge__ = object.__ge__

import functools
import itertools


@functools.total_ordering
class NaturalStringB(str):
    def __repr__(self):
        return "{}({})".format\
            ( type(self).__name__
            , super().__repr__()
            )
    d = lambda s: "".join(c.lower()+c.swapcase() for c in s)
    d = staticmethod(d)
    def __lt__(self, other):
        if not isinstance(self, type(other)):
            return NotImplemented
        groups = map(lambda i: itertools.groupby(i, type(self).isdigit), (self, other))
        zipped = itertools.zip_longest(*groups)
        for s,o in zipped:
            if s is None:
                return True
            if o is None:
                return False
            s_k, s_v = s[0], "".join(s[1])
            o_k, o_v = o[0], "".join(o[1])
            if s_k and o_k:
                s_v, o_v = int(s_v), int(o_v)
                if s_v == o_v:
                    continue
                return s_v < o_v
            elif s_k:
                return True
            elif o_k:
                return False
            else:
                s_v, o_v = self.d(s_v), self.d(o_v)
                if s_v == o_v:
                    continue
                return s_v < o_v
        return False
    def __eq__(self, other):
        if not isinstance(self, type(other)):
            return NotImplemented
        groups = map(lambda i: itertools.groupby(i, type(self).isdigit), (self, other))
        zipped = itertools.zip_longest(*groups)
        for s,o in zipped:
            if s is None or o is None:
                return False
            s_k, s_v = s[0], "".join(s[1])
            o_k, o_v = o[0], "".join(o[1])
            if s_k and o_k:
                s_v, o_v = int(s_v), int(o_v)
                if s_v == o_v:
                    continue
                return False
            elif s_k or o_k:
                return False
            else:
                s_v, o_v = self.d(s_v), self.d(o_v)
                if s_v == o_v:
                    continue
                return False
        return True
    __le__ = object.__le__
    __ne__ = object.__ne__
    __gt__ = object.__gt__
    __ge__ = object.__ge__

import functools
import itertools
import enum


class OrderingType(enum.Enum):
    PerWordSwapCase         = lambda s: s.lower()+s.swapcase()
    PerCharacterSwapCase    = lambda s: "".join(c.lower()+c.swapcase() for c in s)


class NaturalOrdering:
    @classmethod
    def by(cls, ordering):
        def wrapper(string):
            return cls(string, ordering)
        return wrapper
    def __init__(self, string, ordering=OrderingType.PerCharacterSwapCase):
        self.string = string
        self.groups = [ (k,int("".join(v)))
                            if k else
                        (k,ordering("".join(v)))
                            for k,v in
                        itertools.groupby(string, str.isdigit)
                      ]
    def __repr__(self):
        return "{}({})".format\
            ( type(self).__name__
            , self.string
            )
    def __lesser(self, other, default):
        if not isinstance(self, type(other)):
            return NotImplemented
        for s,o in itertools.zip_longest(self.groups, other.groups):
            if s is None:
                return True
            if o is None:
                return False
            s_k, s_v = s
            o_k, o_v = o
            if s_k and o_k:
                if s_v == o_v:
                    continue
                return s_v < o_v
            elif s_k:
                return True
            elif o_k:
                return False
            else:
                if s_v == o_v:
                    continue
                return s_v < o_v
        return default
    def __lt__(self, other):
        return self.__lesser(other, default=False)
    def __le__(self, other):
        return self.__lesser(other, default=True)
    def __eq__(self, other):
        if not isinstance(self, type(other)):
            return NotImplemented
        for s,o in itertools.zip_longest(self.groups, other.groups):
            if s is None or o is None:
                return False
            s_k, s_v = s
            o_k, o_v = o
            if s_k and o_k:
                if s_v == o_v:
                    continue
                return False
            elif s_k or o_k:
                return False
            else:
                if s_v == o_v:
                    continue
                return False
        return True
    # functools.total_ordering doesn't create single-call wrappers if both
    # __le__ and __lt__ exist, so do it manually.
    def __gt__(self, other):
        op_result = self.__le__(other)
        if op_result is NotImplemented:
            return op_result
        return not op_result
    def __ge__(self, other):
        op_result = self.__lt__(other)
        if op_result is NotImplemented:
            return op_result
        return not op_result
    # __ne__ is the only implied ordering relationship, it automatically
    # delegates to __eq__

>>> import natsort
>>> import timeit
>>> l1 = ['Apple', 'corn', 'apPlE', 'arbour', 'Corn', 'Banana', 'apple', 'banana']
>>> l2 = list(map(str, range(30)))
>>> l3 = ["{} {}".format(x,y) for x in l1 for y in l2]
>>> print(timeit.timeit('sorted(l3+["0"], key=NaturalStringA)', number=10000, globals=globals()))
362.4729259099986
>>> print(timeit.timeit('sorted(l3+["0"], key=NaturalStringB)', number=10000, globals=globals()))
189.7340817489967
>>> print(timeit.timeit('sorted(l3+["0"], key=NaturalOrdering.by(OrderingType.PerCharacterSwapCase))', number=10000, globals=globals()))
69.34636392899847
>>> print(timeit.timeit('natsort.natsorted(l3+["0"], alg=natsort.ns.GROUPLETTERS | natsort.ns.LOWERCASEFIRST)', number=10000, globals=globals()))
98.2531585780016

Sortowanie naturalne jest zarówno dość skomplikowane, jak i niejasno zdefiniowane jako problem. Nie zapomnij uruchomić unicodedata.normalize(...) wcześniej i rozważyć użycie str.casefold() zamiast str.lower(). Są prawdopodobnie subtelne problemy z kodowaniem, których nie brałem pod uwagę. Więc Ja wstępnie Poleć bibliotekę natsort . Rzuciłem okiem na repozytorium github; obsługa kodu była świetna.

Wszystkie algorytmy, które widziałem, zależą od sztuczek, takich jak powielanie i opuszczanie znaków i zamiana liter. Podczas gdy to podwaja czas pracy, alternatywa wymagałaby całkowitej naturalnej kolejności na wejściowym zestawie znaków. Nie sądzę, że jest to część specyfikacji unicode, a ponieważ jest o wiele więcej cyfr unicode niż [0-9], tworzenie takiego sortowania byłoby równie trudne. Jeśli chcesz porównywać z ustawieniami lokalnymi, przygotuj Ciągi z locale.strxfrm według Pythona sortowanie jak .

2

Author: user19087,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2017-05-23 11:47:31

Poprawa w odpowiedzi Marka Byersa; -)

import re

def natural_sort_key(s, _re=re.compile(r'(\d+)')):
    return [int(t) if i & 1 else t.lower() for i, t in enumerate(_re.split(s))]

...
my_naturally_sorted_list = sorted(my_list, key=natural_sort_key)

BTW, może nie każdy pamięta, że wartości domyślne argumentów funkcji są obliczane w def czasie

2

Author: Walter Tross,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2020-08-12 14:49:49

Po odpowiedzi @Mark Byers, tutaj jest adaptacja, która akceptuje parametr key i jest bardziej zgodna z PEP8.

def natsorted(seq, key=None):
    def convert(text):
        return int(text) if text.isdigit() else text

    def alphanum(obj):
        if key is not None:
            return [convert(c) for c in re.split(r'([0-9]+)', key(obj))]
        return [convert(c) for c in re.split(r'([0-9]+)', obj)]

    return sorted(seq, key=alphanum)

Zrobiłem też Gist

1

Author: edouardtheron,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2019-08-16 15:46:27

Pozwól, że przedstawię własne podejście do tej potrzeby:

from typing import Tuple, Union, Optional, Generator


StrOrInt = Union[str, int]


# On Python 3.6, string concatenation is REALLY fast
# Tested myself, and this fella also tested:
# https://blog.ganssle.io/articles/2019/11/string-concat.html
def griter(s: str) -> Generator[StrOrInt, None, None]:
    last_was_digit: Optional[bool] = None
    cluster: str = ""
    for c in s:
        if last_was_digit is None:
            last_was_digit = c.isdigit()
            cluster += c
            continue
        if c.isdigit() != last_was_digit:
            if last_was_digit:
                yield int(cluster)
            else:
                yield cluster
            last_was_digit = c.isdigit()
            cluster = ""
        cluster += c
    if last_was_digit:
        yield int(cluster)
    else:
        yield cluster
    return


def grouper(s: str) -> Tuple[StrOrInt, ...]:
    return tuple(griter(s))

Teraz jeśli mamy taką listę:

filelist = [
    'File3', 'File007', 'File3a', 'File10', 'File11', 'File1', 'File4', 'File5',
    'File9', 'File8', 'File8b1', 'File8b2', 'File8b11', 'File6'
]

Możemy po prostu użyć key= kwarg do zrobienia naturalnego sortowania:

>>> sorted(filelist, key=grouper)
['File1', 'File3', 'File3a', 'File4', 'File5', 'File6', 'File007', 'File8', 
'File8b1', 'File8b2', 'File8b11', 'File9', 'File10', 'File11']

Wadą jest oczywiście to, że tak jak teraz, funkcja sortuje wielkie litery przed małymi literami.

Implementację case-insensive grouper zostawię czytelnikowi : -)

1

Author: pepoluan,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2020-05-29 16:49:17

Proponuję po prostu użyć argumentu key słowa kluczowego sorted, aby uzyskać pożądaną listę
Na przykład:

to_order= [e2,E1,e5,E4,e3]
ordered= sorted(to_order, key= lambda x: x.lower())
    # ordered should be [E1,e2,e3,E4,e5]

0

Author: Johny Vaknin,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2017-08-14 14:49:34

Algorytm, którego używam to padzero_with_lower zdefiniowany jako:

import re

def padzero_with_lower(s):
    return re.sub(r'\d+', lambda m: m.group(0).rjust(10, '0'), s).lower()

Algorytm znajduje:

znajduje i umieszcza liczby o dowolnej długości, do wystarczająco dużej długości, np. 10
następnie zamienia łańcuch na małe litery

Oto przykładowe użycie:

print(padzero_with_lower('file1.txt'))   # file0000000001.txt
print(padzero_with_lower('file12.txt'))  # file0000000012.txt
print(padzero_with_lower('file23.txt'))  # file0000000023.txt
print(padzero_with_lower('file123.txt')) # file0000000123.txt
print(padzero_with_lower('file301.txt')) # file0000000301.txt
print(padzero_with_lower('Dir2/file15.txt'))  # dir0000000002/file0000000015.txt
print(padzero_with_lower('dir2/file123.txt')) # dir0000000002/file0000000123.txt
print(padzero_with_lower('dir15/file2.txt'))  # dir0000000015/file0000000002.txt
print(padzero_with_lower('Dir15/file15.txt')) # dir0000000015/file0000000015.txt
print(padzero_with_lower('elm0'))  # elm0000000000
print(padzero_with_lower('elm1'))  # elm0000000001
print(padzero_with_lower('Elm2'))  # elm0000000002
print(padzero_with_lower('elm9'))  # elm0000000009
print(padzero_with_lower('elm10')) # elm0000000010
print(padzero_with_lower('Elm11')) # elm0000000011 
print(padzero_with_lower('Elm12')) # elm0000000012
print(padzero_with_lower('elm13')) # elm0000000013

Po przetestowaniu tej funkcji możemy teraz użyć jej jako naszego klucza, tzn.

lis = ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
lis.sort(key=padzero_with_lower)
print(lis)
# Output: ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

0

Author: Stephen Quan,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2020-12-15 00:15:20

a = ['H1', 'H100', 'H10', 'H3', 'H2', 'H6', 'H11', 'H50', 'H5', 'H99', 'H8']
b = ''
c = []

def bubble(bad_list):#bubble sort method
        length = len(bad_list) - 1
        sorted = False

        while not sorted:
                sorted = True
                for i in range(length):
                        if bad_list[i] > bad_list[i+1]:
                                sorted = False
                                bad_list[i], bad_list[i+1] = bad_list[i+1], bad_list[i] #sort the integer list 
                                a[i], a[i+1] = a[i+1], a[i] #sort the main list based on the integer list index value

for a_string in a: #extract the number in the string character by character
        for letter in a_string:
                if letter.isdigit():
                        #print letter
                        b += letter
        c.append(b)
        b = ''

print 'Before sorting....'
print a
c = map(int, c) #converting string list into number list
print c
bubble(c)

print 'After sorting....'
print c
print a

Podziękowania :

Bubble Sort Homework

Jak czytać ciąg znaków jedna litera na raz w Pythonie

-1

Author: Varadaraju G,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2018-03-12 12:10:14

>>> import re
>>> sorted(lst, key=lambda x: int(re.findall(r'\d+$', x)[0]))
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

-2

Author: SilentGhost,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2011-01-29 12:01:33

score 261 · Accepted Answer

Na PyPI istnieje Biblioteka innej firmy o nazwie natsort (pełne ujawnienie, jestem autorem pakietu). W Twoim przypadku możesz wykonać jedną z następujących czynności:

>>> from natsort import natsorted, ns
>>> x = ['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']
>>> natsorted(x, key=lambda y: y.lower())
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> natsorted(x, alg=ns.IGNORECASE)  # or alg=ns.IC
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

Powinieneś zauważyć, że natsort używa ogólnego algorytmu, więc powinien działać dla prawie każdego wejścia, które rzucisz na niego. Jeśli chcesz dowiedzieć się więcej na temat tego, dlaczego możesz wybrać bibliotekę, aby to zrobić, a nie zwijać własną funkcję, sprawdź stronę natsort documentation ' s How It Works, w szczególne specjalne Przypadki wszędzie! sekcja.

Jeśli potrzebujesz klucza sortowania zamiast funkcji sortowania, użyj jednej z poniższych formuł.

>>> from natsort import natsort_keygen, ns
>>> l1 = ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> l2 = l1[:]
>>> natsort_key1 = natsort_keygen(key=lambda y: y.lower())
>>> l1.sort(key=natsort_key1)
>>> l1
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> natsort_key2 = natsort_keygen(alg=ns.IGNORECASE)
>>> l2.sort(key=natsort_key2)
>>> l2
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

Aktualizacja Listopad 2020

Biorąc pod uwagę, że popularne zapytanie / pytanie brzmi "jak sortować jak Windows Explorer?"(lub czymkolwiek jest przeglądarka systemu plików Twojego systemu operacyjnego), od wersji natsort 7.1.0 istnieje funkcja o nazwie os_sorted aby zrobić dokładnie to. W systemie Windows będzie sortować w w tej samej kolejności co Eksplorator Windows, a na innych systemach operacyjnych powinien sortować jak to, co jest lokalną przeglądarką systemu plików.

>>> from natsort import os_sorted
>>> os_sorted(list_of_paths)
# your paths sorted like your file system browser

Dla tych, którzy potrzebują klucza sortowania, możesz użyć os_sort_keygen (lub os_sort_key, jeśli potrzebujesz tylko domyślnych wartości).

Caveat - proszę przeczytać dokumentację API dla tej funkcji przed użyciem, aby zrozumieć ograniczenia i jak uzyskać najlepsze wyniki.