Python source code analysis - object exploration

01 Preface

Object is the core concept in Python. In the world of python, everything is an object. Integer, string, even type, integer type, string type are all objects.

02 what is PyObject

In Python, everything is an object, and PyObject is the foundation of all objects. It is the core of Python object mechanism. Because it's a base class, and other objects inherit it.

Open Include/python.h and declare as follows:

#define PyObject_HEAD                   \
    _PyObject_HEAD_EXTRA                \
    Py_ssize_t ob_refcnt;               \
    struct _typeobject *ob_type;

typedef struct _object {
} PyObject;

PyObject has two important member objects:

  • ob_refcnt - indicates the reference count. When a new PyObject * references the object, the + 1 operation will be performed. At the same time, when the PyObject * is deleted, the reference count will be reduced. When the count is 0, the object is recycled, waiting for memory to be freed.
  • ob_type records the type information of the object. This structure contains a lot of information, as shown in the following code analysis.

03 type object

In python, we define some type objects in advance, such as int type, str type, dict type, etc., which we call built-in type objects, which implement the concept of "class" in object-oriented.

After these built-in objects are instantiated, you can create instance objects corresponding to type objects, such as int objects, str objects, and dict objects. These instance objects can be regarded as the embodiment of the concept of "object" in object-oriented theory in python.

#define PyObject_VAR_HEAD               \
    PyObject_HEAD                       \
    Py_ssize_t ob_size; /* Number of items in variable part */      
typedef struct _typeobject {
    const char *tp_name; /* For printing, in format "<module>.<name>" */
    Py_ssize_t tp_basicsize, tp_itemsize; /* For allocation */

    /* Methods to implement standard operations */

    destructor tp_dealloc;
    printfunc tp_print;
    getattrfunc tp_getattr;
    setattrfunc tp_setattr;
    cmpfunc tp_compare;
    reprfunc tp_repr;

    /* Method suites for standard classes */

    PyNumberMethods *tp_as_number;
    PySequenceMethods *tp_as_sequence;
    PyMappingMethods *tp_as_mapping;
    /* Attribute descriptor and subclassing stuff */
    struct PyMethodDef *tp_methods;
    struct PyMemberDef *tp_members;
    struct PyGetSetDef *tp_getset;
    struct _typeobject *tp_base;
    PyObject *tp_dict;
    descrgetfunc tp_descr_get;
    descrsetfunc tp_descr_set;
    Py_ssize_t tp_dictoffset;
    initproc tp_init;
    allocfunc tp_alloc;
    newfunc tp_new;
    freefunc tp_free; /* Low-level free-memory routine */
    inquiry tp_is_gc; /* For PyObject_IS_GC */
    PyObject *tp_bases;
    PyObject *tp_mro; /* method resolution order */
    PyObject *tp_cache;
    PyObject *tp_subclasses;
    PyObject *tp_weaklist;
    destructor tp_del;

} PyTypeObject;

In this process, we need to focus on several key member variables:

  • tp_name is the type name, such as' Int ',' tuple ',' list ', etc., which can be output as standard
  • tp_basicsize and tp_itemsize to create the memory information of the object
  • Associated operation
  • Additional information describing the type

04 fixed length object and variable length object

Fixed length objects are easy to understand. For example, an integer object, no matter how large the value is, has a certain storage length. This length is specified by the_typeobject and will not change.

typedef struct {
    long ob_ival;
} PyIntObject;

The length of variable length objects in memory is not certain, so you need to ob_size to record the number of variable length parts. Note that this is not the number of bytes.

#define PyObject_VAR_HEAD               \
    PyObject_HEAD                       \
    Py_ssize_t ob_size; /* Number of items in variable part */

typedef struct {
    long ob_shash;
    int ob_sstate;
    char ob_sval[1];

    /* Invariants:
     *     ob_sval contains space for 'ob_size+1' elements.
     *     ob_sval[ob_size] == 0.
     *     ob_shash is the hash of the string or -1 if not computed yet.
     *     ob_sstate != 0 iff the string object is in stringobject.c's
     *       'interned' dictionary; in this case the two references
     *       from 'interned' to this object are *not counted* in ob_refcnt.

} PyStringObject;

05 example of creating a fixed length object

The code is as follows:

a = int(10)

Python mainly performs the following operations:

  • Step 1: analyze the type to be created. As above, it is pyint_
  • Step 2: construct the object according to the int new function in pyint type
  • The third step: identify 10 of the above code as character passing, then call the PyInt_FromString() function to construct it.
  • The fourth step: finally, we call the PyInt_FromLong(long ival) function to allocate and assign memory to integer objects.

Let's take a look at the code implementation of pyint'u type:

  • tp_name is assigned "int" so that when the type() function, the string is displayed
  • Specify the associated operations of the "int" class, such as release, print, compare, etc
  • tp_basicsize is assigned to sizeof(PyIntObject)
  • tp_itemsize is assigned to 0
PyTypeObject PyInt_Type = {
    PyVarObject_HEAD_INIT(&PyType_Type, 0)
    (destructor)int_dealloc,                    /* tp_dealloc */
    (printfunc)int_print,                       /* tp_print */
    0,                                          /* tp_getattr */
    0,                                          /* tp_setattr */
    (cmpfunc)int_compare,                       /* tp_compare */
    (reprfunc)int_to_decimal_string,            /* tp_repr */
    &int_as_number,                             /* tp_as_number */
    0,                                          /* tp_as_sequence */
    0,                                          /* tp_as_mapping */
    (hashfunc)int_hash,                         /* tp_hash */
    0,                                          /* tp_call */
    (reprfunc)int_to_decimal_string,            /* tp_str */
    PyObject_GenericGetAttr,                    /* tp_getattro */
    0,                                          /* tp_setattro */
    0,                                          /* tp_as_buffer */
        Py_TPFLAGS_BASETYPE | Py_TPFLAGS_INT_SUBCLASS,          /* tp_flags */
    int_doc,                                    /* tp_doc */
    0,                                          /* tp_traverse */
    0,                                          /* tp_clear */
    0,                                          /* tp_richcompare */
    0,                                          /* tp_weaklistoffset */
    0,                                          /* tp_iter */
    0,                                          /* tp_iternext */
    int_methods,                                /* tp_methods */
    0,                                          /* tp_members */
    int_getset,                                 /* tp_getset */
    0,                                          /* tp_base */
    0,                                          /* tp_dict */
    0,                                          /* tp_descr_get */
    0,                                          /* tp_descr_set */
    0,                                          /* tp_dictoffset */
    0,                                          /* tp_init */
    0,                                          /* tp_alloc */
    int_new,                                    /* tp_new */

Here we expand the int_new method, which is to create a function, similar to the constructor in C + +, to generate the PyIntObject code as follows:

static PyObject *
int_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
    PyObject *x = NULL;
    int base = -909;
    static char *kwlist[] = {"x", "base", 0};

    if (type != &PyInt_Type)
        return int_subtype_new(type, args, kwds); /* Wimp out */
    if (!PyArg_ParseTupleAndKeywords(args, kwds, "|Oi:int", kwlist,
                                     &x, &base))
        return NULL;
    if (x == NULL) {
        if (base != -909) {
                            "int() missing string argument");
            return NULL;
        return PyInt_FromLong(0L);
    if (base == -909)
        return PyNumber_Int(x);
    if (PyString_Check(x)) {
        /* Since PyInt_FromString doesn't have a length parameter,
         * check here for possible NULs in the string. */
        char *string = PyString_AS_STRING(x);
        if (strlen(string) != PyString_Size(x)) {
            /* create a repr() of the input string,
             * just like PyInt_FromString does */
            PyObject *srepr;
            srepr = PyObject_Repr(x);
            if (srepr == NULL)
                return NULL;
                 "invalid literal for int() with base %d: %s",
                 base, PyString_AS_STRING(srepr));
            return NULL;
        return PyInt_FromString(string, NULL, base);
    if (PyUnicode_Check(x))
        return PyInt_FromUnicode(PyUnicode_AS_UNICODE(x),
                    "int() can't convert non-string with explicit base");
    return NULL;

Finally, the type information of the newly generated object is assigned to the pyint'u type through the pyint'u fromlong method, and the specific value of the integer is set. If it is a small integer, it can be directly put back from the small home ints array.

#define N_INTOBJECTS    ((BLOCK_SIZE - BHEAD_SIZE) / sizeof(PyIntObject))

#define BLOCK_SIZE      1000    /* 1K less typical malloc overhead */
#define BHEAD_SIZE      8       /* Enough for a 64-bit pointer */

static PyIntObject *small_ints[NSMALLNEGINTS + NSMALLPOSINTS];

struct _intblock {
    struct _intblock *next;
    PyIntObject objects[N_INTOBJECTS];

typedef struct _intblock PyIntBlock;

static PyIntBlock *block_list = NULL;
static PyIntObject *free_list = NULL;

PyObject *
PyInt_FromLong(long ival)
    register PyIntObject *v;
    if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS) {
        v = small_ints[ival + NSMALLNEGINTS];
        if (ival >= 0)
        return (PyObject *) v;
    if (free_list == NULL) {
        if ((free_list = fill_free_list()) == NULL)
            return NULL;
    /* Inline PyObject_New */
    v = free_list;
    free_list = (PyIntObject *)Py_TYPE(v);
    (void)PyObject_INIT(v, &PyInt_Type);
    v->ob_ival = ival;
    return (PyObject *) v;

06 unfold

For performance reasons, there is a special cache pool for small integers in python, so you don't need to use malloc to allocate memory and free memory every time you use small integer objects.

How can big integers other than small integers avoid repeated memory allocation and recycling?

Python's solution is PyIntBlock. The structure of PyIntBlock is a block of memory, in which PyIntObject objects are stored. A PyIntBlock stores n? Objects objects by default.

The chain list of PyIntBlock is maintained through the block list. Each block maintains a PyIntObject array objects. The objects of the block may have some free memory. Therefore, a free list is needed to string these free items for reuse. The PyIntObject object in the objects array is linked from the back to the front through the ob_type field.

The cache pool of small integers finally exists in the memory maintained by block list. During python initialization, pyint init function will be called to apply for memory and create small integer objects.

More content

From Mr. rabbit's website:

View the original > > > Python source code analysis - object exploration

If you are interested in Python language, you can pay attention to me or pay attention to my WeChat official account: xtuz666

Tags: Python Attribute less

Posted on Mon, 09 Mar 2020 21:54:52 -0700 by $phpNut