Explain in detail how http messages - web containers parse http messages

abstract

stay Details of http messages In this paper, the text structure of HTTP message is introduced in detail. So as a server, how does the web container parse http messages? This paper takes jetty and undertow containers as examples to analyze how Web containers handle http messages.

In the previous article, we can see from the overview that http messages are actually regular strings, so parsing them is to parse strings to see if they meet the rules of http protocol.

start-line: Starting line,Basic information describing requests or responses

*( header-field CRLF ): head

CRLF

[message-body]: news body´╝îData actually transmitted

jetty

The following code is version 9.4.12 of jetty

How to parse such a long string? Jetty is implemented through a state machine. See the org.eclipse.jetty.http.HttpParse class specifically.

 public enum State
    {
        START,
        METHOD,
        
![](https://img2018.cnblogs.com/blog/1147363/201910/1147363-20191009220439773-204646534.png),
        SPACE1,
        STATUS,
        URI,
        SPACE2,
        REQUEST_VERSION,
        REASON,
        PROXY,
        HEADER,
        CONTENT,
        EOF_CONTENT,
        CHUNKED_CONTENT,
        CHUNK_SIZE,
        CHUNK_PARAMS,
        CHUNK,
        TRAILER,
        END,
        CLOSE,  // The associated stream/endpoint should be closed
        CLOSED  // The associated stream/endpoint is at EOF
    }

A total of 21 states are divided, and then the flow between States is carried out. In the parseNext method, the starting line - > header - > body content is parsed separately

public boolean parseNext(ByteBuffer buffer)
    {
        try
        {
            // Start a request/response
            if (_state==State.START)
            {
                // Quick Judgment
                if (quickStart(buffer))
                    return true;
            }

            // Request/response line conversion
            if (_state.ordinal()>= State.START.ordinal() && _state.ordinal()<State.HEADER.ordinal())
            {
                if (parseLine(buffer))
                    return true;
            }

            // headers transformation
            if (_state== State.HEADER)
            {
                if (parseFields(buffer))
                    return true;
            }

            // content transformation
            if (_state.ordinal()>= State.CONTENT.ordinal() && _state.ordinal()<State.TRAILER.ordinal())
            {
                // Handle HEAD response
                if (_responseStatus>0 && _headResponse)
                {
                    setState(State.END);
                    return handleContentMessage();
                }
                else
                {
                    if (parseContent(buffer))
                        return true;
                }
            }
         
        return false;
    }

Overall process

The whole has three paths.

  1. Start - > Start - line - > header - > end
  2. Start - > Start - line - > header - > content - > end
  3. Start - > start-line - > header - > chunk-content - > end

Starting line

Start-line = Request-Line (request start line)/(response start line) status-line

  1. Request message parsing state migration
    Request row: START - > METHOD - > SPACE1 - > URI - > SPACE2 - > REQUEST_VERSION

  2. Response message parsing state migration
    Response line: START - > RESPONSE_VERSION - > SPACE1 - > STATUS - > SPACE2 - > REASON

header head

There is only one state of HEADER. In the old version of jetty, HEADER_IN_NAM, HEADER_VALUE, HEADER_IN_VALUE and so on are also distinguished. They are removed in 9.4. In order to improve the matching efficiency, Jetty uses Trie tree to quickly match header headers.

static
    {
        CACHE.put(new HttpField(HttpHeader.CONNECTION,HttpHeaderValue.CLOSE));
        CACHE.put(new HttpField(HttpHeader.CONNECTION,HttpHeaderValue.KEEP_ALIVE));
      // Many generic header s are omitted below

content

Requesting body:

  1. CONTENT - > END. This is a normal message with Content-Length header. HttpParser runs CONTENT until the content Length reaches the specified number and then enters END.
  2. chunked data transmission
    CHUNKED_CONTENT -> CHUNK_SIZE -> CHUNK -> CHUNK_END -> END

undertow

undertow is another web container. How does it handle jetty differently?
State machines are different, io.undertow.util.HttpString.ParseState

    public static final int VERB = 0;
    public static final int PATH = 1;
    public static final int PATH_PARAMETERS = 2;
    public static final int QUERY_PARAMETERS = 3;
    public static final int VERSION = 4;
    public static final int AFTER_VERSION = 5;
    public static final int HEADER = 6;
    public static final int HEADER_VALUE = 7;
    public static final int PARSE_COMPLETE = 8;

The specific processing flow is in the abstract class HttpRequestParser

public void handle(ByteBuffer buffer, final ParseState currentState, final HttpServerExchange builder) throws BadRequestException {
        if (currentState.state == ParseState.VERB) {
            //fast path, we assume that it will parse fully so we avoid all the if statements

            // Fast Processing GET
            final int position = buffer.position();
            if (buffer.remaining() > 3
                    && buffer.get(position) == 'G'
                    && buffer.get(position + 1) == 'E'
                    && buffer.get(position + 2) == 'T'
                    && buffer.get(position + 3) == ' ') {
                buffer.position(position + 4);
                builder.setRequestMethod(Methods.GET);
                currentState.state = ParseState.PATH;
            } else {
                try {
                    handleHttpVerb(buffer, currentState, builder);
                } catch (IllegalArgumentException e) {
                    throw new BadRequestException(e);
                }
            }
            // Processing path
            handlePath(buffer, currentState, builder);
           // Processing version
            if (failed) {
                handleHttpVersion(buffer, currentState, builder);
                handleAfterVersion(buffer, currentState);
            }
            // Processing header
            while (currentState.state != ParseState.PARSE_COMPLETE && buffer.hasRemaining()) {
                handleHeader(buffer, currentState, builder);
                if (currentState.state == ParseState.HEADER_VALUE) {
                    handleHeaderValue(buffer, currentState, builder);
                }
            }
            return;
        }
        handleStateful(buffer, currentState, builder);
    }

Unlike jetty, content is processed. After header processing, data is put into io.undertow.server.HttpServerExchange, and then read in different ways according to type, such as Fixed Length Stream Source Conduit.

Pay attention to the public number [abbot's temple], receive the update of the article at the first time, and start the road of technical practice with Abbot

Reference resources

http://www.blogjava.net/DLevin/archive/2014/04/19/411673.html

https://www.ph0ly.com/2018/10/06/jetty/connection/http-parser/

https://webtide.com/http-trailers-in-jetty/

http://undertow.io/undertow-docs/undertow-docs-2.0.0/

Tags: Java Jetty Eclipse

Posted on Wed, 09 Oct 2019 10:28:52 -0700 by dico