Java Foundation 18: Java Serialization and Deserialization

Copyright Statement: This article is the original article of the blogger. It can not be reproduced without the permission of the blogger. https://blog.csdn.net/a724888/article/details/80210095

Wechat Public [Java Technology Jianghu], a technology station for an Ali Java engineer. Respond to "Java" after paying attention to the public number, you can get free learning materials such as Java Foundation, Advancement, Projects and Architects, as well as popular technology learning videos such as database, distributed, micro-services, which are rich in content and take into account the principles and practices. In addition, the author's original Java learning guide and interview guide for Java programmers will also be presented. Equivalent dry goods resources)



This paper introduces the basic concepts of Java serialization, the use of serialization and deserialization, and the principle of implementation. It summarizes the relevant knowledge points of serialization comprehensively, and uses specific examples to prove it.

Specific code can be found in my GitHub

https://github.com/h2pl/MyTech

If you like, please order Xingha. Thank you.

The article was first published on my personal blog:

https://h2pl.github.io/2018/05/05/javase18

More about Java back-end learning can be found on my CSDN blog:

https://blog.csdn.net/a724888

Reference to this article http://www.importnew.com/17964.html and
https://www.ibm.com/developerworks/cn/java/j-lo-serial/

The concepts of serialization and deserialization

Serialization is the process of converting the state information of an object into a form that can be stored or transmitted. Generally, an object is stored in a storage medium, such as a file or a billionaire buffer. In the process of network transmission, it can be byte or XML format. The byte or XML encoding format can restore exactly the same object. This opposite process is also known as deserialization.

Serialization and deserialization of Java objects

In Java, we can create objects in many ways, and we can reuse them as long as they are not recycled. However, the Java objects we created are all in the heap memory of the JVM.

Only when the JVM is running can these objects exist. Once the JVM stops running, the state of these objects is lost.

But in real application scenarios, we need to persist these objects and be able to re-read them when needed. Java object serialization can help us achieve this function.

Object serialization is an object persistence method built in Java language. By object serialization, the state of the object can be saved as a byte array, and the byte array can be converted into an object by deserialization when necessary.

Object serialization can be easily converted between active objects and byte arrays (streams) in the JVM.

In Java, object serialization and deserialization are widely used in RMI (remote method call) and network transmission.

Relevant interfaces and classes

Java provides a set of convenient API s for developers to serialize and deserialize Java objects. These include the following interfaces and classes:

java.io.Serializable

java.io.Externalizable

ObjectOutput

ObjectInput

ObjectOutputStream

ObjectInputStream

Serializable Interface

Class enables serialization by implementing the java.io.Serializable interface.

Classes that do not implement this interface will not be able to serialize or deserialize any of their states. All subtypes of serializable classes themselves are serializable. Serialization interfaces have no methods or fields and are used only to identify serializable semantics. (The interface does not have methods and fields. Why can only objects of classes that implement the interface be serialized? )

When attempting to serialize an object, encounter an object that does not support the Serializable interface. In this case, NotSerializableException will be thrown.

If the serialized class has a parent class, and if you want to persist the variables defined in the parent class at the same time, the parent class should also integrate the java.io.Serializable interface.

Here is a class that implements the java.io.Serializable interface

public class serialization and deserialization {


    public static void main(String[] args) {

    }
    //Note that internal classes cannot be serialized because they depend on external classes
    @Test
    public void test() throws IOException {
        A a = new A();
        a.i = 1;
        a.s = "a";
        FileOutputStream fileOutputStream = null;
        FileInputStream fileInputStream = null;
        try {
            //Write obj to a file
            fileOutputStream = new FileOutputStream("temp");
            ObjectOutputStream objectOutputStream = new ObjectOutputStream(fileOutputStream);
            objectOutputStream.writeObject(a);
            fileOutputStream.close();
            //Read obj from a file
            fileInputStream = new FileInputStream("temp");
            ObjectInputStream objectInputStream = new ObjectInputStream(fileInputStream);
            A a2 = (A) objectInputStream.readObject();
            fileInputStream.close();
            System.out.println(a2.i);
            System.out.println(a2.s);
            //Print results are the same as before serialization
        } catch (IOException e) {
            e.printStackTrace();
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        }
    }
}

class A implements Serializable {

    int i;
    String s;
}

Externalizable interface

In addition to Serializable, another serialization interface, Externalizable, is also provided in java.

To understand the difference between the Externalizable interface and the Serilizable interface, let's first look at the code and change the above code to use the Externalizable form.

class B implements Externalizable {
    //There must be an open parametric constructor. Otherwise, make a mistake.
    public B() {

    }
    int i;
    String s;
    @Override
    public void writeExternal(ObjectOutput out) throws IOException {

    }

    @Override
    public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {

    }
}

@Test
    public void test2() throws IOException, ClassNotFoundException {
        B b = new B();
        b.i = 1;
        b.s = "a";
        //Write obj to a file
        FileOutputStream fileOutputStream = new FileOutputStream("temp");
        ObjectOutputStream objectOutputStream = new ObjectOutputStream(fileOutputStream);
        objectOutputStream.writeObject(b);
        fileOutputStream.close();
        //Read obj from a file
        FileInputStream fileInputStream = new FileInputStream("temp");
        ObjectInputStream objectInputStream = new ObjectInputStream(fileInputStream);
        B b2 = (B) objectInputStream.readObject();
        fileInputStream.close();
        System.out.println(b2.i);
        System.out.println(b2.s);
        //The print result is 0 and null, the initial value, which is not assigned.
        //0
        //null
    }

Through the example above, it can be found that the values of all attributes of the object obtained after serialization and deserialization of class B have become default values. That is to say, the state of the previous object has not been persisted. This is the difference between the Externalizable interface and the Serilizable interface:

Externalizable inherits Serializable, which defines two abstract methods: writeExternal() and readExternal().

When serializing and deserializing using the Externalizable interface, developers need to rewrite the writeExternal() and readExternal() methods. Since the serialization implementation details are not defined in the above two methods, the output is empty.

It is also worth noting that when using Externalizable for serialization, when reading an object, the parametric constructor of the serialized class is called to create a new object, and then the field values of the saved object are filled into the new object separately. Therefore, the class that implements the Externalizable interface must provide a public parametric constructor.

class C implements Externalizable {
    int i;
    int j;
    String s;
    public C() {

    }
    //Implementing the following two methods allows you to select members that need to be replicated in serialization.
    //Moreover, the order of writing and reading should be the same, otherwise the error will be reported.
    //Multiple variables of the same type can be written in the same order.
    @Override
    public void writeExternal(ObjectOutput out) throws IOException {
        out.writeInt(i);
        out.writeInt(j);
        out.writeObject(s);
    }

    @Override
    public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {
        i = in.readInt();
        j = in.readInt();
        s = (String) in.readObject();
    }
}

@Test
    public void test3() throws IOException, ClassNotFoundException {
        C c = new C();
        c.i = 1;
        c.j = 2;
        c.s = "a";
        //Write obj to a file
        FileOutputStream fileOutputStream = new FileOutputStream("temp");
        ObjectOutputStream objectOutputStream = new ObjectOutputStream(fileOutputStream);
        objectOutputStream.writeObject(c);
        fileOutputStream.close();
        //Read obj from a file
        FileInputStream fileInputStream = new FileInputStream("temp");
        ObjectInputStream objectInputStream = new ObjectInputStream(fileInputStream);
        C c2 = (C) objectInputStream.readObject();
        fileInputStream.close();
        System.out.println(c2.i);
        System.out.println(c2.j);
        System.out.println(c2.s);
        //The print result is 0 and null, the initial value, which is not assigned.
        //0
        //null
    }

Serialized ID

Serialization ID Problem
Scenario: Two clients A and B try to transfer object data through the network. The client A serializes object C into binary data and then passes it to B. The client B deserializes object C.

Question: The whole class path of C object is assumed to be com.inout.Test. There is such a class file on both ends of A and B. The function code is identical. Serializable interfaces are also implemented, but deserialization always hints at unsuccessful results.

Solution: Whether the virtual machine allows deserialization depends not only on the consistency of class paths and functional codes, but also on the consistency of the serialization IDs of the two classes (that is, private static final long series Version UID = 1L). In Listing 1, although the functional codes of the two classes are identical, the serialized IDs are different and they cannot serialize and deserialize each other.

package com.inout; 

import java.io.Serializable; 

public class A implements Serializable { 

    private static final long serialVersionUID = 1L; 

    private String name; 

    public String getName() 
    { 
        return name; 
    } 

    public void setName(String name) 
    { 
        this.name = name; 
    } 
} 

package com.inout; 

import java.io.Serializable; 

public class A implements Serializable { 

    private static final long serialVersionUID = 2L; 

    private String name; 

    public String getName() 
    { 
        return name; 
    } 

    public void setName(String name) 
    { 
        this.name = name; 
    } 
}

Static variables do not participate in serialization

The main method in Listing 2 serializes the object, modifies the value of the static variable, reads the serialized object, and then gets the value of the static variable through the read object and prints it out. According to Listing 2, does the System.out.println(t.staticVar) statement output 10 or 5?

public class Test implements Serializable {

    private static final long serialVersionUID = 1L;

    public static int staticVar = 5;

    public static void main(String[] args) {
        try {
            //Initially staticVar is 5
            ObjectOutputStream out = new ObjectOutputStream(
                    new FileOutputStream("result.obj"));
            out.writeObject(new Test());
            out.close();

            //Modified to 10 after serialization
            Test.staticVar = 10;

            ObjectInputStream oin = new ObjectInputStream(new FileInputStream(
                    "result.obj"));
            Test t = (Test) oin.readObject();
            oin.close();

            //Read it again and print the new value through t.staticVar
            System.out.println(t.staticVar);

        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        }
    }
}

The final output is 10. For readers who cannot understand, the printed staticVar is obtained from the read object and should be saved in the state. The reason for printing 10 is that static variables are not saved in serialization, which is easy to understand. Serialization preserves the state of objects, static variables belong to the state of classes, so serialization does not save static variables.

Explore the serialization of ArrayList

Serialization of ArrayList
Before introducing ArrayList serialization, consider a question:

How to customize serialization and deserialization strategies

With this in mind, let's look at the source code for java.util.ArrayList

public class ArrayList<E> extends AbstractList<E>
        implements List<E>, RandomAccess, Cloneable, java.io.Serializable
{
    private static final long serialVersionUID = 8683452581122892189L;
    transient Object[] elementData; // non-private to simplify nested class access
    private int size;
}

I omitted other member variables, from the above code you can see that ArrayList implements the java.io.Serializable interface, so we can serialize and deserialize it.

Because elementData is transient s (1.8 seems to have changed this), we don't think this member variable will be serialized and retained. Let's write a Demo to test our ideas:

public class ArrayList Serialization of {
    public static void main(String[] args) throws IOException, ClassNotFoundException {
        ArrayList list = new ArrayList();
        list.add("a");
        list.add("b");
        ObjectOutputStream objectOutputStream = new ObjectOutputStream(new FileOutputStream("arr"));
        objectOutputStream.writeObject(list);
        objectOutputStream.close();
        ObjectInputStream objectInputStream = new ObjectInputStream(new FileInputStream("arr"));
        ArrayList list1 = (ArrayList) objectInputStream.readObject();
        objectInputStream.close();
        System.out.println(Arrays.toString(list.toArray()));
        //Successful serialization, the elements inside remain unchanged.
    }

Anyone who knows ArrayList knows that the bottom layer of ArrayList is implemented through arrays. So the array elementData is actually used to save the elements in the list. We know from the way the attribute is declared that it cannot be persisted through serialization. So why does the result of code 4 preserve the elements in the List by serialization and deserialization?

WteObject and readObject methods

A method is defined in ArrayList: writeObject and readObject.

Here we first draw a conclusion:

In the serialization process, if the writeObject and readObject methods are defined in the serialized class, the virtual opportunity attempts to call the writeObject and readObject methods in the object class for user-defined serialization and deserialization.

If there is no such method, the default calls are ObjectOutputStream's defaultWriteObject method and ObjectInputStream's defaultReadObject method.

User-defined writeObject and readObject methods allow users to control the serialization process, such as dynamically changing the serialization value during the serialization process.

Look at the concrete implementation of these two methods:

private void readObject(java.io.ObjectInputStream s)
        throws java.io.IOException, ClassNotFoundException {
        elementData = EMPTY_ELEMENTDATA;

        // Read in size, and any hidden stuff
        s.defaultReadObject();

        // Read in capacity
        s.readInt(); // ignored

        if (size > 0) {
            // be like clone(), allocate array based upon size not capacity
            ensureCapacityInternal(size);

            Object[] a = elementData;
            // Read in all elements in the proper order.
            for (int i=0; i<size; i++) {
                a[i] = s.readObject();
            }
        }
    }


private void writeObject(java.io.ObjectOutputStream s)
        throws java.io.IOException{
        // Write out element count, and any hidden stuff
        int expectedModCount = modCount;
        s.defaultWriteObject();

        // Write out size as capacity for behavioural compatibility with clone()
        s.writeInt(size);

        // Write out all elements in the proper order.
        for (int i=0; i<size; i++) {
            s.writeObject(elementData[i]);
        }

        if (modCount != expectedModCount) {
            throw new ConcurrentModificationException();
        }
    }

So why does ArrayList serialize in this way?

why transient
 ArrayList is actually a dynamic array, which automatically grows the set length value after each filling. If the automatic length of the array is set to 100, and only one element is actually placed, 99 null elements will be serialized. To ensure that so many nulls are not serialized at the same time, ArrayList sets the element array to transient.

why writeObject and readObject
 As mentioned earlier, in order to prevent an array containing a large number of empty objects from being serialized and to optimize storage, ArrayList uses transient s to declare elementData. However, as a collection, elements must also be guaranteed to be persistent in the serialization process, so the elements are retained by rewriting the writeObject and readObject methods.

The writeObject method saves element traversal in the elementData array into the ObjectOutput Stream.

The readObject method reads the object from the input stream and saves the assignment to the elementData array.

How to customize serialization and deserialization strategies

Continuing with the previous section, we just understood the principle of ArrayList serialization of array elements.

Now let's try to answer the question just raised.

How to customize serialization and deserialization strategies

Answer: You can add writeObject and readObject methods to the serialized classes. Then the question arises again:

Although the writeObject and readObject methods are written in ArrayList, they do not show the invoked methods.

So if a class contains writeObject and readObject methods, how are these two methods called?

ObjectOutputStream
From code 4, we can see that the object serialization process is implemented by ObjectOutputStream and ObjectInputStream. So with the question just now, let's analyze how the writeObject and readObject methods in ArrayList are actually called.

In order to save space, the call stack of ObjectOutputStream's writeObject is given here:

writeObject —> writeObject0 —>writeOrdinaryObject—>writeSerialData—>invokeWriteObject

Here's a look at invokeWriteObject:

void invokeWriteObject(Object obj, ObjectOutputStream out)
        throws IOException, UnsupportedOperationException
    {
        if (writeObjectMethod != null) {
            try {
                writeObjectMethod.invoke(obj, new Object[]{ out });
            } catch (InvocationTargetException ex) {
                Throwable th = ex.getTargetException();
                if (th instanceof IOException) {
                    throw (IOException) th;
                } else {
                    throwMiscException(th);
                }
            } catch (IllegalAccessException ex) {
                // should not occur, as access checks have been suppressed
                throw new InternalError(ex);
            }
        } else {
            throw new UnsupportedOperationException();
        }
    }

WteObjectMethod. invoke (obj, new Object []{out}) is the key to call the writeObjectMethod method by reflection. The official explanation for this writing Object Method is as follows:

class-defined writeObject method, or null if none

In our example, this method is the writeObject method we defined in ArrayList. Called by reflection.

Now let's try to answer the question just raised.

If a class contains writeObject and readObject methods, how are they called?

Answer: When using ObjectOutputStream's writeObject method and ObjectInputStream's readObject method, they are invoked by reflection.

Why to achieve Serializable

So far, we've covered the serialization of ArrayList. Well, I wonder if anyone has raised such a question:

Serializable is clearly an empty interface. How does it ensure that only the method that implements the interface can be serialized and deserialized?

The definition of Serializable interface:

public interface Serializable {
}
Readers can try to remove the code inheriting Serializable from code 1 and then execute code 2, throwing out java.io.NotSerializable Exception.

In fact, this question is also a good answer. Let's go back to the call stack of WritteObject of ObjectOutputStream just now:

writeObject ---> writeObject0 --->writeOrdinaryObject--->writeSerialData--->invokeWriteObject

There is such a code in the writeObject0 method:

if (obj instanceof String) {
                writeString((String) obj, unshared);
            } else if (cl.isArray()) {
                writeArray(obj, desc, unshared);
            } else if (obj instanceof Enum) {
                writeEnum((Enum<?>) obj, desc, unshared);
            } else if (obj instanceof Serializable) {
                writeOrdinaryObject(obj, desc, unshared);
            } else {
                if (extendedDebugInfo) {
                    throw new NotSerializableException(
                        cl.getName() + "\n" + debugInfoStack.toString());
                } else {
                    throw new NotSerializableException(cl.getName());
                }
            }

When serializing operations are performed, it is determined whether the classes to be serialized are Enum, Array, and Serilizable types, and if not, it throws Not Serializable Exception directly.

Summary of Sequential Knowledge Points

1. If a class wants to be serialized, it needs to implement the Serializable interface. Otherwise, the NotSerializableException exception will be thrown because the type will be checked during the serialization operation, requiring that the serialized class must belong to either of the Enum, Array, and Serilizable types.

2. Serialization and deserialization of objects through ObjectOutputStream and ObjectInputStream

3. Whether the virtual machine allows deserialization depends not only on the consistency of the classpath and the functional code, but also on the consistency of the serialized IDs of the two classes (i.e. private static final long series Version UID).

Serialization ID provides two generation strategies under Eclipse, one is fixed 1L, the other is random generation of long-type data without duplication (actually generated using JDK tools). Here is a suggestion that if there is no special requirement, it is possible to use default 1L to ensure code one. Timing deserialization was successful. So what's the use of randomly generated serialized IDs? Sometimes, changing the serialized IDs can be used to restrict the use of certain users.

4. Serialization does not save static variables.

5. If you want to serialize the parent class object, you need to let the parent class also implement the Serializable interface.

6. The function of the Transient keyword is to control the serialization of variables. By adding the keyword before the variable declaration, the variable can be prevented from being serialized into the file. After being deserialized, the value of the Transient variable is set to the initial value, such as 0 for int and null for object.

7. The server sends serialized object data to the client. Some data in the object are sensitive, such as password strings. It is hoped that the password field will be encrypted when serialized. If the client has the decrypted key, it can read the password only when the client deserializes it. In this way, the data security of serialized objects can be guaranteed to a certain extent.

8. Adding writeObject and readObject methods to classes can implement custom serialization strategies

Tags: Java jvm github network

Posted on Thu, 08 Aug 2019 19:45:51 -0700 by spheonix