The most comprehensive Python focus you've ever seen

Because I have summed up too many things, the length of the article is a bit long, which is what I "sew and mend" summed up for a long time.

Py2 VS Py3

  • print becomes a function, Python 2 is the key word

  • No more unicode objects, the default str is unicode

  • Python 3 divider returns floating point number

  • No long type

  • Xrange does not exist, range replaces xrange

  • Function name variable names can be defined in Chinese

  • Advanced unpacking and * unpacking

  • Variables that qualify the keyword parameter * must have a name = value

  • raise from

  • iteritems removed to items()

  • yield from link sub-generator

  • asyncio,async/await native protocol supports asynchronous programming

  • New enum,mock,ipaddress,concurrent.futures,asyncio urllib, selector

    • No comparison between different enumeration classes
    • Only equal comparisons can be made between the same enumerated classes
    • Use of enumeration classes (number defaults to 1)
    • To avoid the occurrence of the same enumeration value in the enumeration class, you can decorate the enumeration class with @unique
#Notes for enumeration
from enum import Enum

class COLOR(Enum):
    YELLOW=1
#YELLOW=2#Report wrong
    GREEN=1#No error, GREEN can be considered an alias for YELLOW
    BLACK=3
    RED=4
print(COLOR.GREEN)#COLOR.YELLOW, or YELLOW
for i in COLOR:#Traveling through COLOR doesn't have GREEN
    print(i)
#How to traverse aliases in COLOR.YELLOWnCOLOR.BLACKnCOLOR.REDn
for i in COLOR.__members__.items():
    print(i)
# output:('YELLOW', <COLOR.YELLOW: 1>)\n('GREEN', <COLOR.YELLOW: 1>)\n('BLACK', <COLOR.BLACK: 3>)\n('RED', <COLOR.RED: 4>)
for i in COLOR.__members__:
    print(i)
# output:YELLOW\nGREEN\nBLACK\nRED

#Enumeration transformation
#It's better to use enumerated values instead of label name strings in the data store
#Use enumeration classes in your code
a=1
print(COLOR(a))# output:COLOR.YELLOW

py2/3 conversion tool

  • six module: module compatible with pyton 2 and pyton 3
  • 2to3 Tool: Changing Code Syntax Version
  • future: Use the next version of functionality

Frequently used libraries

  • collections to Know

    https://segmentfault.com/a/1190000017385799

  • python sorting operation and heapq module

    https://segmentfault.com/a/1190000017383322

  • Super-practical method of itertools module

    https://segmentfault.com/a/1190000017416590

Less commonly used but important Libraries

  • Dis (code bytecode analysis)

  • Inspect (generator state)

  • CProfile (Performance Analysis)

  • Bisect (maintain ordered list)

  • fnmatch

    • Case-insensitive under fnmatch (string,"*.txt")#win
    • fnmatch is determined by the system
    • fnmatchcase is case-sensitive
  • Timeit (code execution time)

    def isLen(strString):
        #Or should we use ternary expressions, faster?
        return True if len(strString)>6 else False

    def isLen1(strString):
        #Notice the location of false and true here.
        return [False,True][len(strString)>6]
    import timeit
    print(timeit.timeit('isLen1("5fsdfsdfsaf")',setup="from __main__ import isLen1"))

    print(timeit.timeit('isLen("5fsdfsdfsaf")',setup="from __main__ import isLen"))
  • contextlib

    • @ contextlib.contextmanager makes the generator function a context manager
  • Types (Contains all types of type objects defined by the standard interpreter, which can modify generator functions to asynchronous patterns)

    import types
    types.coroutine #Equivalent to the implementation of _await__
  • HTML (implements the transformation of html)
    import html
    html.escape("<h1>I'm Jim</h1>") # output:'&lt;h1&gt;I&#x27;m Jim&lt;/h1&gt;'
    html.unescape('&lt;h1&gt;I&#x27;m Jim&lt;/h1&gt;') # <h1>I'm Jim</h1>
  • Mock (Solving Test Dependence)
  • Concurrent (create process pool River thread pool)
from concurrent.futures import ThreadPoolExecutor

pool = ThreadPoolExecutor()
task = pool.submit(Function name, (parameter)) #This method will not block and will return immediately.
task.done()#See if task execution is complete
task.result()#Blocking method to view task return value
task.cancel()#Cancel the unimplemented task, return True or False, cancel successfully return True
task.add_done_callback()#Callback function
task.running()#Whether task is being executed is a Future object

for data in pool.map(Function, parameter list):#Returns a list of results of completed tasks, executed in order of parameters
    print(Return the execution result of the task completed data)

from concurrent.futures import as_completed
as_completed(task list)#Return to the list of completed tasks, complete one and execute one

wait(task list,return_when=condition)#Blocking the main thread according to conditions has four conditions
  • Selector (encapsulated select, user multiplexing io programming)
  • asyncio
Future = asyncio. ensure_future (coprocess) equals future = loop. create_task (coprocess)
future.add_done_callback() adds a completed callback function
loop.run_until_complete(future)
future.result() view as a return result

asyncio.wait() accepts an iterative process object
 asynicio.gather(* iteratable object, * iteratable object) has the same result, but gather can be cancelled in batches, gather object. cancel()

There is only one loop in a thread

Be sure to loop.run_forever() in loop.stop or you will report an error
 loop.run_forever() can execute non-cooperative processes
 finally, loop.close() in final module is executed.

asyncio.Task.all_tasks() takes all tasks and iterates over them in turn and uses tasks. cancel() to cancel

Partial function (function, parameter) wraps a function into another function name whose parameters must precede the definition of a function.

Loop. call_soon (function, parameter)
call_soon_threadsafe() thread security
 Loop. call_later (time, function, parameter)
call_soon executes first in the same block of code, then multiple later s execute in ascending order of time

If you have to run blocked code
 Use loop.run_in_executor(executor, function, parameter) to wrap it into a multi-thread, then put it into a task list, run through wait(task list).

Implementing http through asyncio
reader,writer=await asyncio.open_connection(host,port)
writer.writer() sends requests
async for data in reader:
    data=data.decode("utf-8")
    list.append(data)
Then html is stored in the list

as_completed(tasks) completes one return and returns an iterative object

Coordinated lock
async with Lock():

Python advanced

  • Interprocess communication:

    • Manager (many data structures are built in to realize memory sharing among multiple processes)
from multiprocessing import Manager,Process
def add_data(p_dict, key, value):
    p_dict[key] = value

if __name__ == "__main__":
    progress_dict = Manager().dict()
    from queue import PriorityQueue

    first_progress = Process(target=add_data, args=(progress_dict, "bobby1", 22))
    second_progress = Process(target=add_data, args=(progress_dict, "bobby2", 23))

    first_progress.start()
    second_progress.start()
    first_progress.join()
    second_progress.join()

    print(progress_dict)
    • Pipe (for two processes)
from multiprocessing import Pipe,Process
#pipe performs better than queue
def producer(pipe):
    pipe.send("bobby")

def consumer(pipe):
    print(pipe.recv())

if __name__ == "__main__":
    recevie_pipe, send_pipe = Pipe()
    #pipe can only be applied to two processes
    my_producer= Process(target=producer, args=(send_pipe, ))
    my_consumer = Process(target=consumer, args=(recevie_pipe,))

    my_producer.start()
    my_consumer.start()
    my_producer.join()
    my_consumer.join()
    • Queue (cannot be used for process pool, manager (). Queue() is required for communication between process pools)
from multiprocessing import Queue,Process
def producer(queue):
    queue.put("a")
    time.sleep(2)

def consumer(queue):
    time.sleep(2)
    data = queue.get()
    print(data)

if __name__ == "__main__":
    queue = Queue(10)
    my_producer = Process(target=producer, args=(queue,))
    my_consumer = Process(target=consumer, args=(queue,))
    my_producer.start()
    my_consumer.start()
    my_producer.join()
    my_consumer.join()
    • Process pool
def producer(queue):
    queue.put("a")
    time.sleep(2)

def consumer(queue):
    time.sleep(2)
    data = queue.get()
    print(data)

if __name__ == "__main__":
    queue = Manager().Queue(10)
    pool = Pool(2)

    pool.apply_async(producer, args=(queue,))
    pool.apply_async(consumer, args=(queue,))

    pool.close()
    pool.join()
  • Several Common Methods of sys Module

    • argv command line parameter list, the first is the path of the program itself
    • Path returns the search path of the module
    • modules.keys() returns a list of all modules that have been imported
    • exit(0) exit procedure
  • Abbreviation of a in s or b in s or c in s

    • In any way: all() returns True for any iteratable object that is empty
    # Method 1
    True in [i in s for i in [a,b,c]]
    # Method two
    any(i in s for i in [a,b,c])
    # Method three
    list(filter(lambda x:x in s,[a,b,c]))
  • set Set Application

    • {1,2}.issubset({1,2,3})# determines whether or not it is a subset
    • {1,2,3}.issuperset({1,2})
    • {}. isdisjoint({})# determines whether the intersection of two set s is empty, and if it is empty, it is True.
  • Chinese Matching in Code

    • [u4E00-u9FA5] Matches Chinese text intervals [1 to Er]
  • View System Default Encoding Format

    import sys
    sys.getdefaultencoding()    # SetDefault encoding () sets the system encoding mode
  • getattr VS getattribute
class A(dict):
    def __getattr__(self,value):#Returns when the access property does not exist
        return 2
    def __getattribute__(self,item):#Shield all element access
        return item
  • Class variables are not stored in instances _dict_ but only in classes _dict__

  • Globals / locals

    • All variable attributes and values in the current module are stored in globals
    • All variables attributes and values in the current environment are stored in locals
  • Resolution mechanism for python variable names (LEGB)

    • Local Scope
    • The current scope is embedded in the local scope (Enclosing locals)
    • Global/Modular Scope
    • Built-in scope
  • Realize grouping from 1 to 100 in groups of three

    print([[x for x in range(1,101)][i:i+3] for i in range(0,100,3)])
  • What are metaclasses?

    • That is to say, to create a class, you only need to create a class with metaclass = metaclass. Metaclass needs to inherit type instead of object, because type is metaclass.
type.__bases__  #(<class 'object'>,)
object.__bases__    #()
type(object)    #<class 'type'>
    class Yuan(type):
        def __new__(cls,name,base,attr,*args,**kwargs):
            return type(name,base,attr,*args,**kwargs)
    class MyClass(metaclass=Yuan):
        pass
  • What is the type of duck (i.e. polymorphism)?

    • Python does not default to determine the parameter type in the process of using the incoming parameters, as long as the parameters have the execution conditions can be executed.
  • Deep copy and shallow copy

    • Deep copy content, shallow copy address (increase reference count)
    • copy Module Implements Divine copy
  • unit testing

    • TestCase under General Test Class Inheritance Module unittest
    • pytest module quick test (method starts with test_or test file starts with test_or test class starts with test, and cannot have init method)
    • Coverage statistical test coverage
    class MyTest(unittest.TestCase):
        def tearDown(self):# Execution of each test case before execution
            print('This method has been tested.')

        def setUp(self):# Do the operation before executing each test case
            print('The end of this method test')

        @classmethod
        def tearDownClass(self):# You must use the @ classmethod decorator and run all test s once they have finished running
            print('Start testing')
        @classmethod
        def setUpClass(self):# You must use the @classmethod decorator and run all test s once before running
            print('End of test')

        def test_a_run(self):
            self.assertEqual(1, 1)  # test case
  • gil releases gil based on the number of bytecode lines executed and the time slice, and gil actively releases gil when it encounters io operations

  • What is monkey patch?

    • Monkey patch, replacing blocking grammar with non-blocking method at run time
  • What is Introspection?

    • The runtime ability to determine the type of an object, id,type,isinstance
  • Is python a value pass or a reference pass?

    • No, python is a shared parameter and the default parameter is executed only once
  • The Difference between Else and finally in try-except-else-final

    • else executes when no exception occurs, finally executes regardless of whether an exception occurs or not.
    • except can capture more than one exception at a time, but in general, in order to deal with different exceptions differently, we capture and process them separately.
  • GIL Global Interpreter Lock

    • Only one thread can execute at the same time, CPython(IPython) features, other interpreters do not exist.
    • cpu intensive: multi-process + process pool
    • io-intensive: multi-threading/coroutines
  • What is Cython

    • Explain python as a C code tool
  • Generator and Iterator

    • Iterable objects only need to implement _iter_ method

      • The object to implement _next_ and _iter_ methods is the iterator
    • Generator functions using generator expressions or yield s (generators are special iterators)

  • What is a coprocess?

    • yield

    • async-awiat

      • A lighter multitasking approach than threads
      • Realization way
  • dict underlying structure

    • Hash tables are used as the underlying structure to support fast lookup
    • The average search time complexity of hash table is o(1)
    • CPython interpreter uses second probe to solve hash conflict
  • Hash Expansion and Hash Conflict Resolution

    • Link method

    • Second Exploration (Open Addressing): python Use

      • Cyclic replication to new space for expansion
      • Conflict resolution:
    for gevent import monkey
    monkey.patch_all()  #Modify all blocking methods in the code to specify specific methods to be modified
  • Determine whether it is a generator or a coprocess
    co_flags = func.__code__.co_flags

    # Check if it's a protocol
    if co_flags & 0x180:
        return func

    # Check if it's a generator
    if co_flags & 0x20:
        return func
  • Problems and Deformations Solved by Fibonacci
#A frog can jump up a step or two at a time. Find out how many jumping methods the frog can use to jump up an n-step.
#How many ways are there to cover a large 2*n rectangle with n small 2*1 rectangles without overlapping?
#Method 1:
fib = lambda n: n if n <= 2 else fib(n - 1) + fib(n - 2)
#Mode two:
def fib(n):
    a, b = 0, 1
    for _ in range(n):
        a, b = b, a + b
    return b

#A frog can jump up a step or two at a time... It can also jump to level n. Find out how many jumping methods the frog can use to jump up an n-step.
fib = lambda n: n if n < 2 else 2 * fib(n - 1)
  • Get environment variables for computer settings
    import os
    os.getenv(env_name,None)#Get the environment variable if it does not exist as None
  • Garbage Recycling Mechanism

    • Reference counting
    • Mark removal
    • Generation recovery
    #View Generation Recovery Trigger
    import gc
    gc.get_threshold()  #output:(700, 10, 10)
  • True and False are completely equivalent to 1 and 0 in code. They can be calculated directly with numbers. inf represents infinity.

  • C10M/C10K

    • C10M:8 core cpu, 64G memory, 10 million concurrent connections on 10 Gbps network
    • C10K: 1GHz CPU,2G memory, 1gbps network environment to maintain 10,000 clients to provide FTP services
  • The difference between yield from and yield:

    • yield from is followed by an iterative object, with no restrictions behind yield
    • Triggered when the Generator Exit generator stops
  • Several Use of Single Underline

    • When defining variables, they are represented as private variables
    • When unpacking, it means discarding useless data
    • Represents the last code execution result in interaction mode
    • Can be used for digital stitching (111_222_333)
  • Using break does not execute else

  • Conversion from decimal to binary

    def conver_bin(num):
        if num == 0:
            return num
        re = []
        while num:
            num, rem = divmod(num,2)
            re.append(str(rem))
        return "".join(reversed(re))
    conver_bin(10)
  • How can LIST1 = ['A','B','C','D'] get a new list named after the elements in the list A=[],B=[],C=[],D=[]?
    list1 = ['A', 'B', 'C', 'D']

    # Method 1
    for i in list1:
        globals()[i] = []   # Can be used to implement python version reflection

    # Method two
    for i in list1:
        exec(f'{i} = []')   # exec executes string statements
  • memoryview and bytearray$\color{#000} (not often used, just see the record)$
    # bytearray is changeable, bytes is changeable, memory view does not produce new slices and objects
    a = 'aaaaaa'
    ma = memoryview(a)
    ma.readonly  # Read-only memory view
    mb = ma[:2]  # No new strings will be generated

    a = bytearray('aaaaaa')
    ma = memoryview(a)
    ma.readonly  # Writable memory view
    mb = ma[:2]      # Will there be a new bytearray
    mb[:2] = 'bb'    # The change to mb is the change to ma
  • Ellipsis type
# An ellipsis in the code is an Ellipsis object.
L = [1,2,3]
L.append(L)
print(L)    # output:[1,2,3,[...]]
  • lazy inert calculation
    class lazy(object):
        def __init__(self, func):
            self.func = func

        def __get__(self, instance, cls):
            val = self.func(instance)    #It corresponds to the execution area(c),c being the following Circle object
            setattr(instance, self.func.__name__, val)
            return val`

    class Circle(object):
        def __init__(self, radius):
            self.radius = radius

        @lazy
        def area(self):
            print('evalute')
            return 3.14 * self.radius ** 2
  • Traverse the file, pass in a folder, print out the path of all the files in it (recursive)
all_files = []
def getAllFiles(directory_path):
    import os
    for sChild in os.listdir(directory_path):
        sChildPath = os.path.join(directory_path,sChild)
        if os.path.isdir(sChildPath):
            getAllFiles(sChildPath)
        else:
            all_files.append(sChildPath)
    return all_files
  • File Name Processing in File Storage
#secure_filename converts strings into secure filenames
from werkzeug import secure_filename
secure_filename("My cool movie.mov") # output:My_cool_movie.mov
secure_filename("../../../etc/passwd") # output:etc_passwd
secure_filename(u'i contain cool \xfcml\xe4uts.txt') # output:i_contain_cool_umlauts.txt
  • Date formatting
from datetime import datetime

datetime.now().strftime("%Y-%m-%d")

import time
#Only localtime can be formatted here, but time cannot be formatted.
time.strftime("%Y-%m-%d",time.localtime())
  • tuple uses += strange questions
# Errors will be reported, but the value of tuple will change because t[1]id has not changed.
t=(1,[2,3])
t[1]+=[4,5]
# t[1] The append extend method is error-free and can be successfully executed
  • _ missing_ You should know
class Mydict(dict):
    def __missing__(self,key): # The value returned by Mydict when the slice access property does not exist
        return key
  • + + + =
#+ It can't be used to connect lists and meta-ancestors, but += can (through iadd, internal implementation is extends(), so you can add tuples), +will create new objects.
# There is no _iadd_ method for immutable objects, so the _add_ method is used directly, so the meta-ancestors can use += to add the meta-ancestors.
  • How to turn every element of an iterative object into all keys of a dictionary?
dict.fromkeys(['jim','han'],21) # output:{'jim': 21, 'han': 21}
  • Wishark package capture software

Network knowledge

  • What is HTTPS?

    • Secure HTTP protocol, https requires cs certificate, data encryption, port 443, security, the same site https seo ranking will be higher
  • Common Response State Codes

    204 No Content //Successful request processing, no entity returned by the principal, generally used to indicate successful deletion
    206 Partial Content //Get scope request has been successfully processed
    303 See Other //Temporary redirection, expected to use get directional acquisition
    304 Not Modified //Emotional Cache Resources
    307 Temporary Redirect //Temporary redirection, Post will not become Get
    401 Unauthorized //Authentication failed
    403 Forbidden //Resource request rejected
    400 //Request parameter error
    201 //Successful addition or change
    503 //Server maintenance or overload
  • Idempotency and Security of http Request Method
  • WSGI
    # environ: a dict object that contains all HTTP request information
    # start_response: A function that sends HTTP responses
    def application(environ, start_response):
        start_response('200 OK', [('Content-Type', 'text/html')])
        return '<h1>Hello, web!</h1>'
  • RPC

  • CDN

  • SSL(Secure Sockets Layer Secure Socket Layer) and its successor Transport Layer Security (TLS) are security protocols that provide security and data integrity for network communications.

  • SSH (Security Shell Protocol) is the abbreviation of Secure Shell, formulated by IETF Network Working Group; SSH is a security protocol based on Application layer. SSH is a reliable protocol for remote login sessions and other network services. Using SSH protocol can effectively prevent information leakage in the process of remote management. SSH was originally a program on UNIX system, and then rapidly expanded to other operating platforms. When SSH is used correctly, it can make up for the loopholes in the network. SSH client is suitable for many platforms. Nearly all UNIX platforms -- including HP-UX, Linux, AIX, Solaris, Digital UNIX, Irix, and others -- can run SSH.

  • TCP/IP

    • TCP: Connection-Oriented/Reliable/Byte Stream-Based

    • UDP: connectionless/unreliable/message-oriented

    • Shake hands three times and wave hands four times

      • Three Handshakes (SYN/SYN+ACK/ACK)
      • Four waves (FIN/ACK/FIN/ACK)
    • Why do you shake hands three times when connecting and four times when closing?

      • Because when Server receives the SYN connection request message from Client, it can send SYN+ACK message directly. Among them, ACK message is used to respond and SYN message is used to synchronize. But when the connection is closed, when the server receives the FIN message, it is likely that SOCKET will not be shut down immediately, so it can only respond to an ACK message and tell the client side, "I received the FIN message you sent." Only when all the messages on my server are sent out can I send FIN messages, so I can't send them together. So it takes four steps to shake hands.
    • Why does TIME_WAIT state take 2MSL (maximum message segment lifetime) to return to CLOSE state?

      • Although it is reasonable that all four messages have been sent, we can go directly into the CLOSE state, but we must assume that the network is unreliable, there can be the last ACK lost. So the TIME_WAIT state is used to retransmit potentially lost ACK messages.
  • XSS/CSRF

    • HttpOnly can prevent XSS effectively by forbidding js script to access and operate Cookie.

Mysql

  • Index improvement process

    • Linear Structure - > Binary Search - > Hash - > Binary Search Tree - > Balanced Binary Tree - > Multiplex Search Tree - > Multiplex Balanced Search Tree (B-Tree)
  • Mysql Interview Summary Foundation

    https://segmentfault.com/a/1190000018371218

  • Summary of Mysql Interview

    https://segmentfault.com/a/1190000018380324

  • Deep and shallow Mysql

    http://ningning.today/2017/02/13/database/in-depth shallow mysql/

  • When you empty the entire table, InnoDB deletes rows and rows, while MyISAM deletes the build table from scratch.

  • text/blob data types cannot have default values and there is no case-to-case conversion when querying

  • When does the index fail

    • like Fuzzy Query Beginning with%

    • The Hermit Type Transition

    • The leftmost prefix principle is not satisfied

      • For multi-column indexes, indexes will not be used if they are not the first part used
    • Failure scenario:

      • Avoid using!= or <> operators in where clauses as much as possible, otherwise the engine will abandon indexing for full table scanning
      • Avoid using or to join conditions in where clauses as much as possible. Otherwise, it will cause the engine to give up using index and scan the whole table. Even if there are conditions with index, it will not be used. This is why or is used as little as possible.
      • If the column type is a string, you must quote the data in the condition, otherwise you will not use the index.
      • Functional manipulation of fields in the where clause should be avoided as far as possible, which will cause the engine to abandon indexing and scan the whole table.
For example:
select id from t where substring(name,1,3) = 'abc' – name;
//If it begins with abc, it should be changed to:
select id from t where name like 'abc%' 
//For example:
select id from t where datediff(day, createdate, '2005-11-30') = 0 – '2005-11-30';
//Should be changed to:
      • Do not perform function, arithmetic, or other expression operations on the left of "=" in the where clause, otherwise the system may not be able to use the index correctly.
      • As far as possible, avoid expressing fields in the where clause, which will cause the engine to abandon indexing and scan the whole table.
Such as:
select id from t where num/2 = 100 
//Should be changed to:
select id from t where num = 100*2;
      • For example, the set enum column is not suitable (enum type can add null, and the default value automatically filters the set of spaces similar to the enumeration, but only 64 values can be added)

      • If MySQL estimates that full table scanning is faster than indexing, no index is used

  • What is clustered index

    • B+Tree leaf node saves data or pointers
    • MyISAM index and data separation, using non-aggregation
    • InnoDB data file is index file and primary key index is aggregate index.

Summary of Redis commands

  • Why so fast?

    • Based on memory, written in C language

    • Using multiplexing I/O model, non-blocking IO

    • Use single thread to reduce inter-thread switching

      • Because Redis is a memory-based operation, CPU is not the bottleneck of Redis. The most likely bottleneck of Redis is the size of machine memory or network bandwidth. Since single-threading is easy to implement and CPU will not become a bottleneck, it is logical to adopt single-threading scheme (after all, multi-threading will cause a lot of trouble!).
    • Data structure is simple.

    • The VM mechanism is constructed to reduce the time of calling system functions.

  • advantage

    • High Performance - Redis reads 110,000 times per second and writes 81,000 times per second.
    • Rich data types
    • Atomic - All operations of Redis are atomic, and Redis also supports atomic execution after several operations are merged.
    • Rich features - Redis also supports publish/subscribe (publish/subscribe), notification, key expiration, etc.
  • What is redis transaction?

    • A mechanism to package multiple requests and execute multiple commands at one time and in sequence
    • Transaction function is realized by multi,exec,watch and other commands
    • Python redis-py pipeline=conn.pipeline(transaction=True)
  • Persistence approach

    • RDB (snapshot)

      • Save (synchronization guarantees data consistency)
      • Bgsave (asynchronous, shutdown, no AOF is used by default)
    • AOF (Additional Log)

  • How to implement queues

    • push
    • rpop
  • Common data types (Bitmaps,Hyperloglogs, Range Queries, etc.)

    • String: Counter

      • Integer or sds(Simple Dynamic String)
    • List: User's attention, fan list

      • Ziplist or double linked list
    • Hash (Hash):

    • Set: User's followers

      • intset or hashtable
    • Zset (Ordered Set): Real-time Information Ranking List

      • Skiplist
  • Differentiation from Memcached

    • Memcached can only store string keys
    • Memcached users can only add data to the end of an existing string by APPEND and use the string as a list. But when deleting these elements, Memcached hides the elements in the list by blacklisting, which avoids reading, updating and deleting the elements.
    • Both Redis and Memcached store data in memory, both memory databases. But Memcached can also be used to cache other things, such as pictures, videos, and so on.
    • Virtual Memory - Redis swaps some Value s that haven't been used for a long time to disk when physical memory runs out
    • Storage Data Security - When Memcached hangs up, the data is gone; Redis can be saved to disk regularly (persistence)
    • Application scenarios are different: Redis can be used as NoSQL database, as well as message queue, data stack and data cache; Memcached is suitable for caching SQL statements, data sets, user temporary data, delayed query data and Session, etc.
  • Redis Implements Distributed Locks

    • Using setnx to lock, you can add timeout through expire at the same time
    • The value of a lock can be a random uuid or a specific naming
    • When the lock is released, the uuid determines whether it is the lock or not, and then executes the delete release lock.
  • Common problem

    • Cache avalanche

      • Cache data expires in a short period of time, and a large number of requests to access the database
    • Cache penetration

      • When requesting access to data, the query cache does not exist, nor does the database.
    • Cache preheating

      • Initialize the project and add some common data to the cache
    • Cache update

      • Data expiration, update cached data
    • Cache degradation

      • When the volume of access increases dramatically, service problems (such as slow response time or non-response) or non-core services affect the performance of core processes, it is still necessary to ensure that services are still available, even if services are compromised. The system can automatically degrade according to some key data, or can configure switches to achieve manual degrade.
  • Consistency Hash algorithm

    • Ensure data consistency when using clusters
  • A distributed lock based on redis requires a timeout parameter

    • setnx
  • virtual memory

  • memory thrashing

Linux

  • Five Unix i/o Models

    • Blocking io

    • Non blocking io

    • Multiplexing io (using selectot to achieve io multiplexing under Python)

      • select

        • When the concurrency is not high and the number of connections is very active
      • poll

        • Not much better than select
      • epoll

        • Suitable for situations where the number of links is large but the number of active links is small.
    • Signal driven io

    • Asynchronous io(Gevent/Asyncio implements asynchrony)

  • Command Manual Better to Use than man

    • tldr: A manual with examples of commands
  • Differences between kill-9 and kill-15

    • - 15: The program stops immediately / when the program releases the corresponding resources / the program may still be running
    • - 9: Because of the uncertainty of - 15, use - 9 directly to kill the process immediately.
  • Paging mechanism (memory allocation management scheme for separating logical and physical addresses):

    • Operating system for efficient memory management, reduce fragmentation
    • The logical address of the program is divided into fixed size pages
    • Physical addresses are divided into frames of the same size
    • Corresponding logical and physical addresses through page tables
  • Segmentation mechanism

    • To meet some of the logical requirements of the code
    • Data Sharing/Data Protection/Dynamic Link
    • Continuous memory allocation within each segment and discrete allocation between segments
  • Check cpu memory usage?

    • top
    • free view available memory and troubleshoot memory leaks

Design pattern

Singleton mode

    # One way
    def Single(cls,*args,**kwargs):
        instances = {}
        def get_instance (*args, **kwargs):
            if cls not in instances:
                instances[cls] = cls(*args, **kwargs)
            return instances[cls]
        return get_instance
    @Single
    class B:
        pass
    # Mode two
    class Single:
        def __init__(self):
            print("The implementation of singleton mode II.")

    single = Single()
    del Single  # Every call to single is enough.
    # Way 3 (the most commonly used way)
    class Single:
        def __new__(cls,*args,**kwargs):
            if not hasattr(cls,'_instance'):
                cls._instance = super().__new__(cls,*args,**kwargs)
            return cls._instance

Factory mode

    class Dog:
        def __init__(self):
            print("Wang Wang Wang")
    class Cat:
        def __init__(self):
            print("Miao Miao Miao")


    def fac(animal):
        if animal.lower() == "dog":
            return Dog()
        if animal.lower() == "cat":
            return Cat()
        print("Sorry, it must be: dog,cat")

Tectonic model

    class Computer:
        def __init__(self,serial_number):
            self.serial_number = serial_number
            self.memory = None
            self.hadd = None
            self.gpu = None
        def __str__(self):
            info = (f'Memory:{self.memoryGB}',
            'Hard Disk:{self.hadd}GB',
            'Graphics Card:{self.gpu}')
            return ''.join(info)
    class ComputerBuilder: 
        def __init__(self):
            self.computer = Computer('Jim1996')
        def configure_memory(self,amount):
            self.computer.memory = amount
            return self #To facilitate chain invocation
        def configure_hdd(self,amount):
            pass
        def configure_gpu(self,gpu_model):
            pass
    class HardwareEngineer:
        def __init__(self):
            self.builder = None
        def construct_computer(self,memory,hdd,gpu)
            self.builder = ComputerBuilder()
            self.builder.configure_memory(memory).configure_hdd(hdd).configure_gpu(gpu)
        @property
        def computer(self):
            return self.builder.computer

Data Structure and Algorithms Built-in Data Structure and Algorithms

python implements various data structures

Quick sort

    def quick_sort(_list):
            if len(_list) < 2:
                return _list
            pivot_index = 0
            pivot = _list(pivot_index)
            left_list = [i for i in _list[:pivot_index] if i < pivot]
            right_list = [i for i in _list[pivot_index:] if i > pivot]
        return quick_sort(left) + [pivot] + quick_sort(right)

Selection sort

    def select_sort(seq):
        n = len(seq)
        for i in range(n-1)
        min_idx = i
            for j in range(i+1,n):
                if seq[j] < seq[min_inx]:
                    min_idx = j
            if min_idx != i:
                seq[i], seq[min_idx] = seq[min_idx],seq[i]

Insertion sort

    def insertion_sort(_list):
        n = len(_list)
        for i in range(1,n):
            value = _list[i]
            pos = i
            while pos > 0 and value < _list[pos - 1]
                _list[pos] = _list[pos - 1]
                pos -= 1
            _list[pos] = value
            print(sql)

Merge sort

    def merge_sorted_list(_list1,_list2):   #Merge ordered list
        len_a, len_b = len(_list1),len(_list2)
        a = b = 0
        sort = []
        while len_a > a and len_b > b:
            if _list1[a] > _list2[b]:
                sort.append(_list2[b])
                b += 1
            else:
                sort.append(_list1[a])
                a += 1
        if len_a > a:
            sort.append(_list1[a:])
        if len_b > b:
            sort.append(_list2[b:])
        return sort

    def merge_sort(_list):
        if len(list1)<2:
            return list1
        else:
            mid = int(len(list1)/2)
            left = mergesort(list1[:mid])
            right = mergesort(list1[mid:])
            return merge_sorted_list(left,right)

Heap sorting heapq module

    from heapq import nsmallest
    def heap_sort(_list):
        return nsmallest(len(_list),_list)

Stack

    from collections import deque
    class Stack:
        def __init__(self):
            self.s = deque()
        def peek(self):
            p = self.pop()
            self.push(p)
            return p
        def push(self, el):
            self.s.append(el)
        def pop(self):
            return self.pop()

queue

    from collections import deque
    class Queue:
        def __init__(self):
            self.s = deque()
        def push(self, el):
            self.s.append(el)
        def pop(self):
            return self.popleft()

Two points search

    def binary_search(_list,num):
        mid = len(_list)//2
        if len(_list) < 1:
            return Flase
        if num > _list[mid]:
            BinarySearch(_list[mid:],num)
        elif num < _list[mid]:
            BinarySearch(_list[:mid],num)
        else:
            return _list.index(num)

Interview Enhancement Questions:

Database optimization and design

  • How to implement a queue using two stacks

  • Inverted list

  • Merge two ordered lists

  • Delete linked list nodes

  • Inverse Binary Tree

  • Designing Short Web Site Service? 62-bit implementation

  • Design a second kill system (feed stream)?

  • Why is it better to use self-increasing integers for the primary keys of mysql databases? Is it okay to use uuid? Why?

    • If the order of data writing in InnoDB tables is the same as that of leaf nodes in B + tree index, the access efficiency is the highest. Self-growing id should be used as the primary key for storage and query performance.
    • For InnoDB's primary index, the data will be sorted according to the primary key. Because of the disorder of UUID, InnoDB will generate enormous IO pressure. At this time, it is not suitable to use UUID as the physical primary key. It can be used as the logical primary key, and the physical primary key still uses the self-increasing ID. For global uniqueness, you should use UUID as an index to associate other tables or as a foreign key
  • If it is a distributed system, how can we generate the self-incremental id of the database?

    • Using redis
  • A distributed lock based on redis requires a timeout parameter

    • setnx
    • setnx + expire
  • If redis single node goes down, how do you deal with it? Are there other industry schemes to implement distributed lock codes?

    • Using hash consistent algorithm

Cache algorithm

  • LRU(least-recently-used): Replace the least recently used object
  • LFU(Least frequently used): The least frequently used. If a data has been used very little in the recent period of time, it is unlikely that it will be used in the future.

Service-side performance optimization direction

  • Using data structures and algorithms

  • data base

    • Index optimization

    • Slow Query Elimination

      • slow_query_log_file opens and queries slow query logs
      • Question of Index Search by explain
      • Adjust Data Modification Index
    • Batch operation to reduce io operation

    • Use NoSQL: Redis, for example

  • Network io

    • Batch operation
    • pipeline
  • cache

    • Redis
  • asynchronous

    • Asyncio implements asynchronous operation
    • Use Celery to reduce io blocking
  • Concurrent

    • Multithreading
    • Gevent

Tags: Python Redis network ssh

Posted on Sat, 12 Oct 2019 05:47:22 -0700 by abhi201090