python: serialization / deserialization and deep / shallow copy of objects

python: serialization / deserialization and vulnerability thinking

1, Serialization / deserialization

There are many ways of serialization / deserialization built in python, the most commonly used are json, pickle and marshal. The example usage is as follows:

import json
import pickle
import marshal
 
author1 = {"name": "Shanshanyuan", "blog": "https://editor.csdn.net/md?articleId=104850849", "title": "architect", "pets": ["dog", "cat"]}
 
# json serialization
json_str = json.dumps(author1)
print("json=>\n", json_str)
 
# json string deserialization
author2 = json.loads(json_str)
 
# pickle serialization
pickle_str = pickle.dumps(author1)
print("pickle=>\n", pickle_str)
 
# pickle string deserialization
author3 = pickle.loads(pickle_str)
 
# Marshall serialization
marshal_str = marshal.dumps(author1)
print("marshal=>\n", marshal_str)
 
# marshal deserialization
author4 = marshal.loads(marshal_str)
 
print("\n",
      id(author1), "\n",
      id(author2), "\n",
      id(author3), "\n",
      id(author4), "\n",
      author1, "\n",
      author2, "\n",
      author3, "\n",
      author4)
 
with open("json.txt", "w") as file1:
    json.dump(author1, file1)
 
with open("pickle.txt", "wb") as file2:
    pickle.dump(author1, file2)
 
with open("marshal.txt", "wb") as file3:
    marshal.dump(author1, file3) 

Output:

json=>
 {"name": "\u83e9\u63d0\u6811\u4e0b\u7684\u6768\u8fc7", "blog": "http://yjmyzz.cnblogs.com/", "title": "\u67b6\u6784\u5e08", "pets": ["dog", "cat"]}
pickle=>
 b'\x80\x03}q\x00(X\x04\x00\x00\x00nameq\x01X\x15\x00\x00\x00\xe8\x8f\xa9\xe6\x8f\x90\xe6\xa0\x91\xe4\xb8\x8b\xe7\x9a\x84\xe6\x9d\xa8\xe8\xbf\x87q\x02X\x04\x00\x00\x00blogq\x03X\x1a\x00\x00\x00http://yjmyzz.cnblogs.com/q\x04X\x05\x00\x00\x00titleq\x05X\t\x00\x00\x00\xe6\x9e\xb6\xe6\x9e\x84\xe5\xb8\x88q\x06X\x04\x00\x00\x00petsq\x07]q\x08(X\x03\x00\x00\x00dogq\tX\x03\x00\x00\x00catq\neu.'
marshal=>
 b'\xfb\xda\x04name\xf5\x15\x00\x00\x00\xe8\x8f\xa9\xe6\x8f\x90\xe6\xa0\x91\xe4\xb8\x8b\xe7\x9a\x84\xe6\x9d\xa8\xe8\xbf\x87\xda\x04blog\xfa\x1ahttp://yjmyzz.cnblogs.com/\xda\x05title\xf5\t\x00\x00\x00\xe6\x9e\xb6\xe6\x9e\x84\xe5\xb8\x88\xda\x04pets[\x02\x00\x00\x00\xda\x03dog\xda\x03cat0'
 
 4307564944
 4309277360
 4307565016
 4309277432
 {'name': 'Shanshanyuan', 'blog': 'https://editor.csdn.net/md?articleId=104850849', 'title': 'architect', 'pets': ['dog', 'cat']}
 {'name': 'Shanshanyuan', 'blog': 'https://editor.csdn.net/md?articleId=104850849', 'title': 'architect', 'pets': ['dog', 'cat']}
 {'name': 'Shanshanyuan', 'blog': 'https://editor.csdn.net/md?articleId=104850849', 'title': 'architect', 'pets': ['dog', 'cat']}
 {'name': 'Shanshanyuan', 'blog': 'https://editor.csdn.net/md?articleId=104850849', 'title': 'architect', 'pets': ['dog', 'cat']}

Note: the api method name is easy to remember. dump/dumps means "garbage". Pour the object into xxx, even if the serialization is completed. Otherwise, load/loads loads (restores) objects from strings or files. In particular, it is worth mentioning that pickling and marshal have security problems. If the loaded string or file contains malicious code designed carefully, the malicious code will be executed (for the vulnerability of deserialization, you can check it on the Internet, and there are many similar Introductions). In addition, in terms of string size after serialization, by default, in this example, the string length after json serialization is the smallest, so. In a word, students are recommended to use json serialization / deserialization.

2, The difference of pickle and Marshall

1. What's the difference between jsonpickle and Marshall

  • pickle keeps track of objects that have already been serialized, and if there is a reference to the same object, it will not be serialized again. But Marshall will.
  • marshal cannot serialize user-defined objects.
  • marshal does not guarantee delivery between Python versions.

2. What are the differences between pickle and json methods

  • json's serialized output is text object (unicode), while pickle's serialized output is binary bytes. If you don't understand the difference between the two, see my last blog: the difference between Python 2 and python 3 on string encoding processing.
  • json is more readable.
    json has extensive compatibility and can be used in many places other than python. pickle is only for Python.
  • json can only represent some Python built-in types, not user-defined class objects. When you try to serialize a custom class object, a TypeError is thrown. pickle represents most objects, including user-defined classes (most of them).

3, What types can be serialized and deserialized (Python 3)

  • None, True, and False
  • Integer, float, complex
  • strings, bytes, bytearrays
  • Tuples, lists, collections, and dictionaries that contain only serializable objects
  • Functions defined at the top of the module (lambda expressions are not allowed
  • Define built-in functions at the top level of the module
  • Defined at the top level of the module
  • The last one I translated is not good, directly to the original English: instances of such classes who dict or the result of calling getstate() is pickable

4, Vulnerability analysis

The vulnerability is due to its ability to serialize and deserialize custom classes. The object generated after deserialization will trigger the \\\\\\\\\.

There is no problem with serialization and deserialization per se. However, when the input deserialized data can be controlled by the user, the attacker can generate unexpected objects by constructing malicious input, and execute the constructed arbitrary code in the process.

A brief description of \\\\\\\\\\\\\\\\\.

Supplement:

import pickle
a = 123 #pickle.dumps(a)=b'\x80\x03K{.'
print(id(a), pickle.dumps(a))
b = b'\x80\x03K{.'
print(id(pickle.loads(b)),pickle.loads(b))

The output results are as follows:

	140708742856768 b'\x80\x03K{.'
	140708742856768 123

Prove:
First, a is assigned as 123, and then the value of a is serialized and deserialized. Then the value after deserialization is still 123, that is to say, the return value of pickle.load(f) is 123, so d=pickle.load(f) is equivalent to d=123. Although it is a new object, the program finds that there is already 123 in the memory, so it points to the existing address of 123 directly , which is different from other languages. So the last address you see is the same

Published 17 original articles, praised 0, visited 1088
Private letter follow

Tags: JSON Python encoding Lambda

Posted on Sun, 15 Mar 2020 05:02:48 -0700 by ethridgt