Apache HttpClient4.5 setting TLS protocol

When using the crawler made by webmagic to crawl the website data, we found that some pictures could not be crawled. After a comparison, it is found that all the pictures that cannot be crawled report the same error:

Encrypted HTTPS traffic flows through this CONNECT tunnel. HTTPS Decryption is enabled in Fiddler, so decrypted sessions running in this tunnel will be shown in the Web Sessions list. Secure Protocol: Tls12 Cipher: Aes256 256bits Hash Algorithm: Sha384 ?bits Key Exchange: RsaKeyX 2048bits == Server Certificate ==========

It seems that the transport layer SSL settings are not correct. First look at the source code. The httpclient used by webmagic 0.73 is version 4.5.2. The httpclient code constructed by webmagic is as follows:

 private CloseableHttpClient generateClient(Site site) {
        HttpClientBuilder httpClientBuilder = HttpClients.custom();

        httpClientBuilder.setConnectionManager(connectionManager);
        if (site.getUserAgent() != null) {
            httpClientBuilder.setUserAgent(site.getUserAgent());
        } else {
            httpClientBuilder.setUserAgent("");
        }
        if (site.isUseGzip()) {
            httpClientBuilder.addInterceptorFirst(new HttpRequestInterceptor() {

                public void process(
                        final HttpRequest request,
                        final HttpContext context) throws HttpException, IOException {
                    if (!request.containsHeader("Accept-Encoding")) {
                        request.addHeader("Accept-Encoding", "gzip");
                    }
                }
            });
        }
        //Solve the jump problem of post/redirect/post 302
        httpClientBuilder.setRedirectStrategy(new CustomRedirectStrategy());

        SocketConfig.Builder socketConfigBuilder = SocketConfig.custom();
        socketConfigBuilder.setSoKeepAlive(true).setTcpNoDelay(true);
        socketConfigBuilder.setSoTimeout(site.getTimeOut());
        SocketConfig socketConfig = socketConfigBuilder.build();
        httpClientBuilder.setDefaultSocketConfig(socketConfig);
        connectionManager.setDefaultSocketConfig(socketConfig);
        httpClientBuilder.setRetryHandler(new DefaultHttpRequestRetryHandler(site.getRetryTimes(), true));
        generateCookie(httpClientBuilder, site);
        return httpClientBuilder.build();
    }

It is based on the custom method to build a custom httpclient object, without seeing the explicit setting of the security layer protocol.
Through debug, issue the settings about the security layer, and build the socketFactory as follows:

    private SSLConnectionSocketFactory buildSSLConnectionSocketFactory() {
        try {
            return new SSLConnectionSocketFactory(createIgnoreVerifySSL()); // Priority bypass security certificate
        } catch (KeyManagementException e) {
            logger.error("ssl connection fail", e);
        } catch (NoSuchAlgorithmException e) {
            logger.error("ssl connection fail", e);
        }
        return SSLConnectionSocketFactory.getSocketFactory();
    }

Among them, the createIgnoreVerifySSL method is the key, and the code is as follows:

    private SSLContext createIgnoreVerifySSL() throws NoSuchAlgorithmException, KeyManagementException {
        // Implement an X509TrustManager interface to bypass authentication without modifying the methods inside
        X509TrustManager trustManager = new X509TrustManager() {

            @Override
            public void checkClientTrusted(X509Certificate[] chain, String authType) throws CertificateException {
            }

            @Override
            public void checkServerTrusted(X509Certificate[] chain, String authType) throws CertificateException {
            }

            @Override
            public X509Certificate[] getAcceptedIssuers() {
                return null;
            }

        };

        SSLContext sc = SSLContext.getInstance("SSLv3");
        sc.init(null, new TrustManager[] { trustManager }, null);
        return sc;
    }

The logic of bypassing security certificates is not discussed. Here, it builds SSLContext based on SSLv3. So TLS1 protocol is definitely not supported.

TLS and SSL, as the security layer of TCP/IP protocol layer, provide data confidentiality and integrity. TLS is like a better successor for SSL.

How to set HttpClient to support TLS1. First, look at the official documents. Modify the SSL setting code based on the official documents to get:

       // Trust own CA and all self-signed certs
        SSLContext sslcontext = SSLContexts.custom()
//                .loadTrustMaterial(new File("my.keystore"), "nopassword".toCharArray(),
//                        new TrustSelfSignedStrategy())
                .build();
        // Allow TLSv1 protocol only
        SSLConnectionSocketFactory sslsf = new SSLConnectionSocketFactory(
                sslcontext,
                new String[] { "TLSv1" },
                null,
                SSLConnectionSocketFactory.getDefaultHostnameVerifier());
        CloseableHttpClient httpclient = HttpClients.custom()
                .setSSLSocketFactory(sslsf)
                .build();
        try {

            HttpGet httpget = new HttpGet("");

            System.out.println("Executing request " + httpget.getRequestLine());

            CloseableHttpResponse response = httpclient.execute(httpget);
            try {
                HttpEntity entity = response.getEntity();

                System.out.println("----------------------------------------");
                System.out.println(response.getStatusLine());
                byte[] bytes = EntityUtils.toByteArray(entity);
                FileUtils.writeByteArrayToFile(new File("E:/test"), bytes);
            } finally {
                response.close();
            }
        } finally {
            httpclient.close();
        }

The TLSv1 protocol is displayed in the socketFactory and tested to be effective.

The demo function is to download a picture and save it locally. After testing, the picture downloaded locally can be opened normally

Tags: SSL encoding

Posted on Thu, 28 May 2020 08:25:15 -0700 by Tekime