Python hash tables and parsing

[TOC]

Hello, everyone!It's been nearly a month and a half since my last article was published. It happened that this month I encountered a job adjustment and had to take care of my 10-month-old child. That's not easy, so I haven't updated it for such a long time.No, I try to finish this article a little bit every day,'Dripping water into the river'.

1. Packaging and Structure

1.1 Packaging

Description: Multiple values on the right side of the equal sign (=) are encapsulated into a tuple, called encapsulated packing, simply by commas.

# Example:
x = 1,
y = 1,2
print(type(x), x)
print(type(y), y)

# The output is as follows:
<class 'tuple'> (1,)
<class 'tuple'> (1, 2)

Note: If there is only one numeric value on the right and there is no comma, it is an integer type, please note.In addition, the right side of the equal sign must run before it is assigned to the left.

1.2 Deconstruction

Description: Elements of the container type on the right of the equal sign (=) correspond to variables separated by commas on the left, called deconstruction unpacking.

x,y = (1,2)  # [1,2] {1,2} {'a':1,'b':2}
print(x)
print(y)

# The output is as follows:
1
2

*Remarks: * The container on the right can be tuples, lists, dictionaries, collections, etc. It must be an iterative object.

Error demonstration:

x,y = (1,2,3)
print(x)
print(y)

# The output is as follows:
ValueError: too many values to unpack (expected 2)

* Note: * The number of left and right sides must be the same, otherwise'ValueError'error will be thrown.

Remaining Variable Deconstruction

* Note: *python3 introduces residual variable deconstruction,'gather as much data as possible'to form a list.

x, *rest = [1,2,3,4,5,6]
print(type(x), x)
print(type(rest), rest)  # All that remains unassigned is rest

# The output is as follows:
<class 'int'> 1
<class 'list'> [2, 3, 4, 5, 6]
*rest, y = [1,2,3,4,5,6]
print(type(rest), rest)
print(type(y), y)

# The output is as follows:
<class 'list'> [1, 2, 3, 4, 5]
<class 'int'> 6

Error example:

  • Can't be used alone

    * Note: * There is only one identifier to the left of the equal sign and it cannot be deconstructed.

    *rest = [1,2,3,4,5,6]
    print(rest)
    
    # The output is as follows:
     #Syntax error
     SyntaxError: starred assignment target must be in a list or tuple
    
  • Can't be used multiple times at the same time

    x, *rest1, *rest2, y = [1,2,3,4,5,6]
    print(rest1)
    print(rest2)
    
    # The output is as follows:
     #Syntax error, one rest takes away the remaining elements, and the other rest how?
     SyntaxError: two starred expressions in assignment
    

Another discard variable underline:''

Description:''is a valid identifier, and most scenarios indicate that they do not care about the value.

x, *_, y = [1,2,3,4,5,6]
print(x)
print(_)
print(y)
# The output is as follows:
1
[2, 3, 4, 5]
6
_, *rest, _ = [1,2,3,4,5,6]
print(_)  # ''is the last output value
print(rest)
# The output is as follows:
6
[2, 3, 4, 5]

2. Collection Set

* Description: * Collection is a'variable, disordered, non-repeating'collection of elements.

Being a collection element is conditional:'The element must be hash, iterative'

Hashable objects are as follows (immutable):

  • Numeric types: int (integer), float (floating point), complex (complex)
  • Boolean: True (yes), False (no)
  • String: string (string), bytes (bytes)
  • Tuple (tuple)
  • None (empty)

The built-in hash function can be used to determine whether hash is acceptable:

s1 = [1,2,3]
print(hash(s1))

# The output is as follows:
TypeError: unhashable type: 'list'  # Lists are not hash-able

2.1 Initialization

Explain:

  • Set() -> new empty set object, new empty set
  • Set (iterable) -> new set object, elements must be iterative
s = {}  # Note that this is an empty dictionary, not an empty collection
s1 = set()  # Empty Set
s2 = set([1,2,3])  # Notice that the elements in the list iterate over integers, hash
s3 = set("abcd")
print(s1)
print(s2)
print(s3)

# The output is as follows:
set()
{1, 2, 3}
{'c', 'd', 'a', 'b'}

Error example:

s = set([[1]]) # List set list, iterated out is a list, not hash
print(s)

# The output is as follows:
TypeError: unhashable type: 'list'

2.2 Increase

  • s.add(element)

    * Description: * Add an element to the collection and do not operate if the element already exists.

s1 = set([1,2,3])
s1.add(4)
print(s1)

# The output is as follows:
{1, 2, 3, 4}
  • s.update(*element))

    *Description: * Merge one or more elements into a set, elements must be iterative (merge iterated elements into a set), just like the later set.

s1 = set([1,2,3])
s1.update((4,5,6),[7,8,9])
print(s1)

# The output is as follows:
{1, 2, 3, 4, 5, 6, 7, 8, 9}

2.3 Delete

  • remove(element)

    * Description: * Remove an element from the collection if the element does not have a throw'KeyError'error.

    s1 = {1,2,3,4,5,6}
    s1.remove(6)
    print(s1)
    
    # The output is as follows:
    {1, 2, 3, 4, 5}
    
  • discard(element)

    * Note: * also removes an element from the collection, and does nothing if the element does not exist without an exception.

    s1 = {1,2,3,4,5,6}
    s1.discard(6)
    print(s1)
    
    # The output is as follows:
    {1, 2, 3, 4, 5}
    
  • pop()

    * Description: * Delete'any'element because the set is out of order and throw a'KeyError' error if it is empty.

    s1 = {1,2,3,4,5,6}
    print(s1.pop())  # Random (because of disorder)
    print(s1)
    
    # The output is as follows:
    1
    {2, 3, 4, 5, 6}
    
  • clear()

    * Instructions: * Remove all elements, not recommended.

    s1 = {1,2,3,4,5,6}
    s1.clear()
    print(s1)
    
    # The output is as follows:
    set()
    

2.4 traversal

* Note: * The collection is a container and traversable, but the efficiency is O(n).

s1 = {1,2,3}
for s in s1:
    print(s)
# The output is as follows:
1
2
3

So, which of the set and list traversals do you find more efficient?

The answer is set, because the elements of set are hash values as keys (the dictionary described below is also hash values), query time complexity is O(1), list is a linear data structure, and time complexity is O(n).

You can verify as follows, as the size of the data increases, it is clear which is more efficient.

2.5 Union-Intersection-Difference-Symmetric Difference

  • Union

    Description: Combines all elements from multiple collections to form a new collection.

    s1 = {1,2,3}
    s2 = {3,4,5}
    print(s1.union(s2))
    
    # The output is as follows:
    {1, 2, 3, 4, 5}
    

    Note: Operators'|','update(element)','|='can also be used.

  • intersection

    Description: Take common (intersecting) elements from multiple sets

    s1 = {1,2,3}
    s2 = {3,4,5}
    print(s1.intersection(s2))
    
    # The output is as follows:
    {3}
    

    Note: You can also use'&','s.intersection_update(element)','&='.

  • Difference set

    *Description: *A collection of elements belonging to one set but not another.

    s1 = {1,2,3}
    s2 = {3,4,5}
    print(s1.difference(s2))
    
    # The output is as follows:
    {1, 2}
    

    Note: You can also use'-','s.difference_update(element)','-='.

  • Symmetric difference set

    * Description: * In multiple collections, does not belong to the collection of intersecting elements.

    s1 = {1,2,3}
    s2 = {3,4,5}
    print(s1.symmetric_difference(s2))
    
    # The output is as follows:
    {1, 2, 4, 5}
    

    Note: You can also use'^','s1.symmetric_difference_update(s2)','^='.

3. Dictionaries

* Note: * A dictionary is a collection of any item (element), and an item is a tuple of key:value pairs.

  • Dictionaries are'variable': support add-delete and alter-check;
  • Dictionaries are'out of order': key storage is out of order, non-linear data structure (don't let the surface fool you);
  • The dictionary is'key does not repeat': key is unique and must be'hash';

3.1 Initialization

# Empty Dictionary
d1 = {}
d2 = dict()

# Example:
d3 = dict(a=1,b=2,c=3)
d4 = dict(d3)
d5 = dict([('a',1),('b',2),('c',3)])  # Elements must be iterative
d6 = {'a':1,'b':2,'c':3}

# The output is:
{'a': 1, 'b': 2, 'c': 3}

3.2 Examination of additions, deletions and alterations

  • Add & Modify Elements

    1) By'd[key] = value':

    Note: If a key does not exist, it is added, and if it exists, it is directly overwritten (modifying elements).

    # Add & Modify
    d = {'a':1,'b':2,'c':3}
    d['d'] = 4 # increase
    d['a'] = 11 # modify
    print(d)
    
    # The output is as follows:
    {'a': 11, 'b': 2, 'c': 3, 'd': 4}
    

    2) via d.update ([E,]**F) -> None

    # Add & Modify
    d = {'a':1,'b':2,'c':3}
    d.update(d=4)
    print(d)
    
    # The output is as follows:
    {'a': 1, 'b': 2, 'c': 3, 'd': 4}
    
  • Delete element

    1)d.pop()

    • If the key exists, it is removed and the corresponding value is returned.
    • key does not exist, returns the given default value, otherwise throws KeyError.
    d = {'a':1,'b':2,'c':3}
    print(d.pop('c',None))
    print(d)
    
    # The output is as follows:
    3
    {'a': 1, 'b': 2}
    

    2)d.popitem()

    • Delete and return an arbitrary item(key:value).
    • If it is an empty dictionary, throw KeyError.
    d = {'a':1,'b':2,'c':3}
    print(d.popitem())
    print(d)
    
    # The output is as follows:
    ('c', 3)
    {'a': 1, 'b': 2}
    

    3)d.clear()

    • Delete all item s, not recommended.
    d = {'a':1,'b':2,'c':3}
    d.clear()
    print(d)
    
  • Find Elements

    • The key is used to quickly find the value.
    • Time complexity is O(1), and efficiency does not decrease with the size of the data.

    Normal access elements:

    d = {'a':1,'b':2,'c':3}
    print(d['a'])
    print(d.get('b'))
    
    # The output is as follows:
    1
    2
    

    How key does not exist:

    d = {'a':1,'b':2,'c':3}
    print(d.get('d',None))  # Returns None by default if key does not exist
    print(d.setdefault('d',100))  # If key does not exist, add a new key:value pair
    print(d)
    
    # The output is as follows:
    None
    100
    {'a': 1, 'b': 2, 'c': 3, 'd': 100}
    

3.3 Traversal

  • Traversal key:

    d = {'a':1,'b':2,'c':3}
    # Method 1:
    for k in d:  # The default is traversal key
        print(k)
    
    # Method 2:
    for k in d.keys():
        print(k)
    
    # Method 3:
    for k, _ in d.items():
        print(k)
    
    # The output is as follows:
    a
    b
    c
    
  • Traversal value:value

    d = {'a':1,'b':2,'c':3}
    # Method 1:
    for v in d.values():
        print(v)
    
    # Method 2:
    for k in d:
        # print(d[k])  # Also available
        print(d.get(k)) 
    
    # Method 3:
    for _, v in d.items():
        print(v)
    
    # The output is as follows:
    1
    2
    3
    
  • Traverse item:key-value

    d = {'a':1,'b':2,'c':3}
    for item in d.items():
        print(item)
    
    # The output is as follows:
    ('a', 1)
    ('b', 2)
    ('c', 3)
    
  • Other questions

    In this case, you cannot delete elements or change the size of the dictionary while traversing.

    d = {'a':1,'b':2,'c':3}
    for k in d:
        print(d.pop(k))
    
    # The output is as follows:
    RuntimeError: dictionary changed size during iteration
    

    Elegant way to delete:

    d = {'a':1,'b':2,'c':3}
    key_list = []
    for k in d:
        key_list.append(k)
    for k in key_list:
        print('Deleted key:', d.pop(k))
    

    And if you want to get rid of it, just use clear().

4. Parser and generator expressions

4.1 List Parser

grammar

  • [Return value for element in iterative object if condition]
  • List parsing is represented by square brackets'[]'
  • Return to a new list

Advantage

  • Increase of efficiency
  • Code lightweight
  • High readability

Example requirement: Extract elements from a given interval that can be divided by 2.

Common writing:

list = []
for i in range(10):
    if i % 2 == 0:
        list.append(i)
print(list)

# The output is as follows:
[0, 2, 4, 6, 8]

Feel the simple and elegant way of writing again:

print([i for i in range(10) if i % 2 == 0])

# The output is as follows:
[0, 2, 4, 6, 8]

This is the list parsing, also known as list pushing.

4.2 Generator Expression

grammar

  • (the return value for element in iterates over the object if condition)
  • Generator expressions are denoted by square brackets'()'
  • Returns a generator object

Characteristic:

  • On-demand calculation is when a value is needed (while list resolution is a one-time calculation that returns all results immediately)
  • It doesn't take up much memory in the early days, and it ends up taking more values just like a list parser.
  • The calculation takes a very short time and does not return the result itself; it returns the generator object.

See what the generator object looks like (don't think of it as a tuple parser, haha):

x = (i for i in range(10) if i % 2 == 0)
print(type(x))
print(x)

# The output is as follows:
<class 'generator'>  # generator
<generator object <genexpr> at 0x000001A143ACBA98> # Generator object

How the generator object computes the result:

import time
x = (i for i in range(10) if i % 2 == 0)
for i in range(6):  # Value only once in a loop
    time.sleep(0.5)
    print(next(x))
 time.sleep(1)
print(next(x))  # The for loop has calculated all the results and cannot take values, so it throws an exception

# The output is as follows:
0
2
4
6
8
StopIteration  # Exception thrown out of Iterable range

Note: Generator expressions can only be iterated once.

4.3 Set Resolution

Collection parsing is similar to list parsing without much parsing.

Grammar:

  • {Return value for element in iterative object if condition}
  • Set parsing is represented by curly brackets'{}'
  • Return a collection

Example:

print({i for i in range(10) if i % 2 == 0})

# The output is as follows:
{0, 2, 4, 6, 8}

4.4 Dictionary Parsing

Dictionary parsing is similar to set parsing without much parsing.

Grammar:

  • {key:value for element in iterative object if condition}
  • Dictionary parsing is indicated by curly brackets'{}'
  • Return a dictionary

Example:

print({i:(i+1) for i in range(10) if i % 2 == 0})

# The output is as follows:
{0: 1, 2: 3, 4: 5, 6: 7, 8: 9}

Overall, parsing is highly recommended if it is easy to understand and efficient.

But there are some scenarios that are complex to write, and that still have to be written with for...in loop splitting.

If you like my article, please pay attention to my public number: drip technology, scanner attention, share it regularly

Tags: Programming REST

Posted on Wed, 13 May 2020 10:52:05 -0700 by jeff21