A Url encoded pit encountered in the use of Retrofit network framework

Retrofit is a type safe network request framework for Android and Java platforms. The network module of the project is encapsulated by retrofit framework;

In the development process, the third-party sdk is used, and the data output by the third party should be uploaded to the server, but the + output from the sdk is transmitted to the server through Url encode, and the url decode of the server becomes a space...

There is no problem in online url encoding and decoding of the output data. The final encoding and decoding results are the same as the original data, so we started to check the implementation of url encode in the bottom layer of Retrofit

background

The network requests encapsulated by the Retrofit framework in the code are as follows:

@FormUrlEncoded
@POST("user/upload")
Observable<Response<Boolean>> upload(@Field(value = "data", encoded = true) String data);

It was thought that encoded=true meant to encode the url. Based on the one by one exclusion method, the encoded value was set to false, so the url encode should not be performed. After the result was changed to false, the server returned success. What? Print the network request and find that the body of the url is still encoded. Why

Then print out the data under the condition of encoded=true and encoded=false. It is found that the code of + is different, but the others are the same. The network search fails to find the reason. You can only check the Retrofit source code step by step

//Raw data
/gAAAAAAAAD+AAAAAAAAAP4AAAAAAAAA

//encoded = true
%2FgAAAAAAAAD+AAAAAAAAAP4AAAAAAAAA

//encoded = false
%2FgAAAAAAAAD%2BAAAAAAAAAP4AAAAAAAAA

Investigate the cause according to the source code

Retrofit, annotation form urlencoded automatically sets the network request to application/x-www-form-urlencoded, and Url the annotated Field data

/**
 * Denotes that the request body will use form URL encoding. Fields should be declared as
 * parameters and annotated with {@link Field @Field}.
 * <p>
 * Requests made with this annotation will have {@code application/x-www-form-urlencoded} MIME
 * type. Field names and values will be UTF-8 encoded before being URI-encoded in accordance to
 * <a href="http://tools.ietf.org/html/rfc3986">RFC-3986</a>.
 */
@Documented
@Target(METHOD)
@Retention(RUNTIME)
public @interface FormUrlEncoded {
}

Well, there is no doubt that the form data of the Field annotation will be Url encoded automatically by using this annotation. Why is there an encoded parameter?

@Documented
@Target(PARAMETER)
@Retention(RUNTIME)
public @interface Field {
  String value();

  /** Specifies whether the {@linkplain #value() name} and value are already URL encoded. */
  boolean encoded() default false;
}

According to the Field's annotation, the encoded parameter indicates whether the value has been Url encoded. That's easy to understand. encoded=true indicates that the value data has been Url encoded. Why does the printed data have other data except the plus sign or Url encoding?

I can only watch it next

RequestFactory parsing annotations

//For the parsing of annotation FormUrlEncoded, use the http method of POST, PUT and PATCH (hasBody=true)
if (annotation instanceof FormUrlEncoded) {
    if (isMultipart) {
       throw methodError(method, "Only one encoding annotation is allowed.");
    }
    isFormEncoded = true;
}

//Analysis of annotation Field
if (annotation instanceof Field) {
   validateResolvableType(p, type);
   if (!isFormEncoded) {
      throw parameterError(method, p, "@Field parameters can only be used with form encoding.");
   }
   Field field = (Field) annotation;
   String name = field.value();
   boolean encoded = field.encoded();

   gotField = true;

   Class<?> rawParameterType = Utils.getRawType(type);
   if (Iterable.class.isAssignableFrom(rawParameterType)) {
   if (!(type instanceof ParameterizedType)) {
      throw parameterError(method, p, rawParameterType.getSimpleName()
                + " must include generic type (e.g., "
                + rawParameterType.getSimpleName()
                + "<String>)");
      }
      ParameterizedType parameterizedType = (ParameterizedType) type;
      Type iterableType = Utils.getParameterUpperBound(0, parameterizedType);
      Converter<?, String> converter =
              retrofit.stringConverter(iterableType, annotations);
      return new ParameterHandler.Field<>(name, converter, encoded).iterable();
}

//If the FormUrlEncoded annotation is used but not used (Field annotation or FieldMap annotation), an error will be reported
if (isFormEncoded && !gotField) {
        throw methodError(method, "Form-encoded method must contain at least one @Field.");
}

RequestBuilder will be created in RequestFactory. Next, we will see the processing of Url encoding by RequestBuilder

In the construction method, if isFormEncoded=true Will create okhttp3.FormBody Example
if (isFormEncoded) {
   // Will be set to 'body' in 'build'.
   formBuilder = new FormBody.Builder();
} else if (isMultipart) {
   // Will be set to 'body' in 'build'.
   multipartBuilder = new MultipartBody.Builder();
   multipartBuilder.setType(MultipartBody.FORM);
}

//If encoded=true, it will be called into the addEncoded method of the formBuilder instance
void addFormField(String name, String value, boolean encoded) {
    if (encoded) {
      formBuilder.addEncoded(name, value);
    } else {
      formBuilder.add(name, value);
    }
}

The next step is to go into the source code of okttp. The final URL code is implemented by the okhttp source code

Corresponding interface implementation of FormBody

//encoded=false will enter the url encoding in this method
fun add(name: String, value: String) = apply {
  names += name.canonicalize(
      encodeSet = FORM_ENCODE_SET,
      plusIsSpace = true,
      charset = charset
  )
  values += value.canonicalize(
      encodeSet = FORM_ENCODE_SET,
      plusIsSpace = true,
      charset = charset
  )
}

//encoded=true will enter this method for url encoding
fun addEncoded(name: String, value: String) = apply {
   names += name.canonicalize(
      encodeSet = FORM_ENCODE_SET,
      alreadyEncoded = true,
      plusIsSpace = true,
      charset = charset
   )
   values += value.canonicalize(
      encodeSet = FORM_ENCODE_SET,
      alreadyEncoded = true,
      plusIsSpace = true,
      charset = charset
   )
}

//In the HttpUrl class
/**
     * Returns a substring of `input` on the range `[pos..limit)` with the following
     * transformations:
     *
     *  * Tabs, newlines, form feeds and carriage returns are skipped.
     *
     *  * In queries, ' ' is encoded to '+' and '+' is encoded to "%2B".
     *
     *  * Characters in `encodeSet` are percent-encoded.
     *
     *  * Control characters and non-ASCII characters are percent-encoded.
     *
     *  * All other characters are copied without transformation.
     *
     * @param alreadyEncoded true to leave '%' as-is; false to convert it to '%25'.
     * @param strict true to encode '%' if it is not the prefix of a valid percent encoding.
     * @param plusIsSpace true to encode '+' as "%2B" if it is not already encoded.
     * @param unicodeAllowed true to leave non-ASCII codepoint unencoded.
     * @param charset which charset to use, null equals UTF-8.
     */
    internal fun String.canonicalize(
      pos: Int = 0,
      limit: Int = length,
      encodeSet: String,
      alreadyEncoded: Boolean = false,
      strict: Boolean = false,
      plusIsSpace: Boolean = false,
      unicodeAllowed: Boolean = false,
      charset: Charset? = null
    ): String {
      var codePoint: Int
      var i = pos
      while (i < limit) {
        codePoint = codePointAt(i)
        if (codePoint < 0x20 ||
            codePoint == 0x7f ||
            codePoint >= 0x80 && !unicodeAllowed ||
            codePoint.toChar() in encodeSet ||
            codePoint == '%'.toInt() &&
            (!alreadyEncoded || strict && !isPercentEncoded(i, limit)) ||
            codePoint == '+'.toInt() && plusIsSpace) {
          // Slow path: the character at i requires encoding!
          val out = Buffer()
          out.writeUtf8(this, pos, i)
          out.writeCanonicalized(
              input = this,
              pos = i,
              limit = limit,
              encodeSet = encodeSet,
              alreadyEncoded = alreadyEncoded,
              strict = strict,
              plusIsSpace = plusIsSpace,
              unicodeAllowed = unicodeAllowed,
              charset = charset
          )
          return out.readUtf8()
        }
        i += Character.charCount(codePoint)
      }

      // Fast path: no characters in [pos..limit) required encoding.
      return substring(pos, limit)
    }

    private fun Buffer.writeCanonicalized(
      input: String,
      pos: Int,
      limit: Int,
      encodeSet: String,
      alreadyEncoded: Boolean,
      strict: Boolean,
      plusIsSpace: Boolean,
      unicodeAllowed: Boolean,
      charset: Charset?
    ) {
      var encodedCharBuffer: Buffer? = null // Lazily allocated.
      var codePoint: Int
      var i = pos
      while (i < limit) {
        codePoint = input.codePointAt(i)
        if (alreadyEncoded && (codePoint == '\t'.toInt() || codePoint == '\n'.toInt() ||
                codePoint == '\u000c'.toInt() || codePoint == '\r'.toInt())) {
          // Skip this character.
        } else if (codePoint == '+'.toInt() && plusIsSpace) {
          // Encode '+' as '%2B' since we permit ' ' to be encoded as either '+' or '%20'.
          writeUtf8(if (alreadyEncoded) "+" else "%2B")
        } else if (codePoint < 0x20 ||
            codePoint == 0x7f ||
            codePoint >= 0x80 && !unicodeAllowed ||
            codePoint.toChar() in encodeSet ||
            codePoint == '%'.toInt() &&
            (!alreadyEncoded || strict && !input.isPercentEncoded(i, limit))) {
          // Percent encode this character.
          if (encodedCharBuffer == null) {
            encodedCharBuffer = Buffer()
          }

          if (charset == null || charset == UTF_8) {
            encodedCharBuffer.writeUtf8CodePoint(codePoint)
          } else {
            encodedCharBuffer.writeString(input, i, i + Character.charCount(codePoint), charset)
          }

          while (!encodedCharBuffer.exhausted()) {
            val b = encodedCharBuffer.readByte().toInt() and 0xff
            writeByte('%'.toInt())
            writeByte(HEX_DIGITS[b shr 4 and 0xf].toInt())
            writeByte(HEX_DIGITS[b and 0xf].toInt())
          }
        } else {
          // This character doesn't need encoding. Just copy it over.
          writeUtf8CodePoint(codePoint)
        }
        i += Character.charCount(codePoint)
      }

conclusion

Well, from the above code, it can be found that when alreadyEncoded=true, + will not be encoded, so an error will be found here. Because the + data of the client is not encoded because encoded=true, the plus sign corresponds to a space after the url is decoded by the server, which is why the server fails to parse

if (codePoint == '+'.toInt() && plusIsSpace) {
   // Encode '+' as '%2B' since we permit ' ' to be encoded as either '+' or '%20'.
   writeUtf8(if (alreadyEncoded) "+" else "%2B")
}

The default encoded value is false, and the url is encoded by the okhttp underlying layer. However, if you want to encode the data by yourself, set the encoded value to true, otherwise you do not need to set this parameter, as follows:

@FormUrlEncoded
@POST("user/upload")
Observable<Response<Boolean>> upload(@Field(value = "data") String data);

Source address of Retrofit and okhttp

https://github.com/square/okhttp

https://github.com/square/retrofit

Published 26 original articles· Zan Zan 0. 40000 visitors+
Private letter follow

Tags: encoding Retrofit network OkHttp

Posted on Mon, 06 Apr 2020 01:39:32 -0700 by markstrange