JAVA-CURL, a self-used Java crawler tool, is open source

Project address: https://github.com/rockswang/...
Central Warehouse: https://mvnrepository.com/art...

brief introduction

The CUrl class refers to the command line tool CUrl and uses the standard Java HTTP URLConnection to implement the Http tool class.

Characteristic

  • Http class implementation based on standard Java runtime, source code compatibility level 1.6, wide applicability, can be used in server, Android and other Java environments
  • The code is compact and compact, with only 1,000 lines of Java source files, without any external dependencies, and can be reused directly at source level without Maven.
  • Common switches that are fully compatible with CUrl command-line tools can be directly replaced as command-line tools
  • Supports all HTTP methods and supports multiple file uploads
  • ThreadLocal solves the problem that Cookie s can only be saved globally in standard Java and can be maintained independently per thread.
  • Cookies saved in threads can be serialized and saved to facilitate the establishment of Cookies pools
  • Support HTTP authentication, support HTTPS, enable / ignore certificate security
  • Supports per Connection Agent and HTTPS Agent for Authentication
  • The jump behavior can be controlled and the response header information of each jump can be obtained.
  • Support Programming Custom Response Parser
  • Support failed retry, programmable custom retry exception

Supported parameters

Parameter name Shortcut method Explain
--compressed nothing Request to compress reply data with gzip (with server-side support)
--connect-timeout timeout Connection timeout, unit second, default 0, that is, never timeout
-b, --cookie cookie Reading Cookie from File/IO Object/Parameter String
-c, --cookie-jar cookieJar Cookie output to file/IO object
-d, --data, --data-ascii data Add post data. If you use it many times, connect with'&'and then add a form key pair that will override the previous < br/> if the data starts with'@', then the latter part will be the file name. The data will be read into the file and the return line will be deleted.
--data-raw nothing Same as "- data", but not special treatment of "@"
--data-binary nothing The same "- data" but does not delete the return line character when reading the file
--data-urlencode data(data,charset) With the same "-data", but for data Url-Encode, you can append a character set after this option, such as "-data-urlencode-GBK"<br/> if the first character of the parameter value is'=': Url-Encode* for the whole string after'='; if the parameter value contains'=': to split the string into key-value pairs divided by'&', the key-value pair is split by'='and all the values of the key-value are Url-Encode<br>. /> If the parameter value does not contain'=':<br/> -- if the string does not contain'@', then Url-Encode < br/> is applied to the whole string -- if the string contains'@', the string is divided by'@', followed by the input file name, then the text is read from the file and the Url-Encode is executed, and the key before'@'is <br/> -- if'@' is the first character, then the whole text is read out of the file. Url-Encode
-D, --dump-header dumpHeader Output the response header for the last jump to the given file/IO object
-F, --form form Initiate file upload by adding a file or form entry < br/> - If the parameter value initials'@'or'<' read data from the specified file for upload. The difference between'@'and'<' is that the file contents of'@'are uploaded as file attachments, and the file contents of'<' are < br/> - otherwise, the parameter values are the values of ordinary form items.
--form-string form(formString) Initiate file upload and add a non-file form item. Note that this method does not handle'@'specially.
-G, --get nothing Mandatory use of GET methods
-H, --header header Add a request header line with the grammar: <br/> - "Host: baidu.com": add/set a common request header key pair <br/> - "Accept:": delete the given request header <br/> - "X-Custom-Header"; add/set a custom request header with an empty value
-I, --head nothing Request using HEAD method
-k, --insecure insecure Ignore HTTPS Certificate Security Check
-L, --location location Automatic follow jump (default does not open)
-m, --max-time timeout Transmission timeout time, unit second, default 0, that is, never timeout
-o, --output output Specify the output file / IO object, default stdout, that is "-"
-x, --proxy proxy Setting up proxy server
-U, --proxy-user nothing Setting Proxy Server Logon Information
-e, --referer nothing Setting Referer Request Header Content
--retry retry Set the number of retries, default 0
--retry-delay retry Set the delay between two retries, per second, default 0
--retry-max-time retry Set the maximum total retry time, unit seconds, default 0, that is, never timeout
-s, --silent nothing Set silent mode, i.e. shield all output
--stderr stderr Set stderr output file / IO object, default stdout
-u, --user nothing Setting Server Logon Information
--url CUrl, url Setting the request address, this CUrl library does not support multiple url requests
-A, --user-agent nothing Setting the "User-Agent" request header content
-X, --request nothing Specify the HTTP request method
--x-max-download nothing Abandon download after transmission reaches a given number of bytes (imprecise)
--x-tags nothing Setting additional key-value pair information, stored in the current CUrl instance, to pass additional parameters in programming

Example

Example 1: POST form submission
    public void httpPost() {
        CUrl curl = new CUrl("http://httpbin.org/post")
                .data("hello=world&foo=bar")
                .data("foo=overwrite");
        curl.exec();
        assertEquals(200, curl.getHttpCode());
    }
Example 2: Accessing HTTPS sites through Fiddler proxy
    public void insecureHttpsViaFiddler() {
        CUrl curl = new CUrl("https://httpbin.org/get")
                .proxy("127.0.0.1", 8888) // Use Fiddler to capture & parse HTTPS traffic
                .insecure();  // Ignore certificate check since it's issued by Fiddler
        curl.exec();
        assertEquals(200, curl.getHttpCode());
    }
Example 3: Upload multiple files, one memory file, and one physical file
    public void uploadMultipleFiles() {
        CUrl.MemIO inMemFile = new CUrl.MemIO();
        try { inMemFile.getOutputStream().write("text file content blabla...".getBytes()); } catch (Exception ignored) {}
        CUrl curl = new CUrl("http://httpbin.org/post")
                .form("formItem", "value") // a plain form item
                .form("file", inMemFile)           // in-memory "file"
                .form("image", new CUrl.FileIO("D:\\tmp\\a2.png")); // A file in storage
        curl.exec();
        assertEquals(200, curl.getHttpCode());
    }
Example 4: Simulate AJAX requests on mobile browsers and add custom request headers
    public void customUserAgentAndHeaders() {
        String mobileUserAgent = "Mozilla/5.0 (Linux; U; Android 8.0.0; zh-cn; KNT-AL10 Build/HUAWEIKNT-AL10) " 
                + "AppleWebKit/537.36 (KHTML, like Gecko) MQQBrowser/7.3 Chrome/37.0.0.0 Mobile Safari/537.36";
        Map<String, String> fakeAjaxHeaders = new HashMap<String, String>();
        fakeAjaxHeaders.put("X-Requested-With", "XMLHttpRequest");
        fakeAjaxHeaders.put("Referer", "http://somesite.com/fake_referer");
        CUrl curl = new CUrl("http://httpbin.org/get")
                .opt("-A", mobileUserAgent) // simulate a mobile browser
                .headers(fakeAjaxHeaders)   // simulate an AJAX request
                .header("X-Auth-Token: xxxxxxx"); // other custom header, this might be calculated elsewhere
        curl.exec();
        assertEquals(200, curl.getHttpCode());
    }
Example 5: Multithread concurrent requests, Cookies between threads are independent
    public void threadSafeCookies() {
        final CountDownLatch count = new CountDownLatch(3);
        final CUrl[] curls = new CUrl[3];
        for (int i = 3; --i >= 0;) {
            final int idx = i;
            new Thread() {
                public void run() {
                    CUrl curl = curls[idx] = new CUrl("http://httpbin.org/get")
                            .cookie("thread" + idx + "=#" + idx);
                    curl.exec();
                    count.countDown();
                }
            }.start();
        }
        try { count.await(); } catch (Exception ignored) {} // make sure all requests are done
        assertEquals(200, curls[0].getHttpCode());
        assertEquals("thread0=#0", deepGet(curls[0].getStdout(jsonResolver, null), "headers.Cookie"));
        assertEquals("thread1=#1", deepGet(curls[1].getStdout(jsonResolver, null), "headers.Cookie"));
        assertEquals("thread2=#2", deepGet(curls[2].getStdout(jsonResolver, null), "headers.Cookie"));
    }
Example 6: Programming a custom response parser to parse HTML using JSoup
    private CUrl.Resolver<Document> htmlResolver = new CUrl.Resolver<Document>() {
        @SuppressWarnings("unchecked")
        @Override
        public Document resolve(int httpCode, byte[] responseBody) throws Throwable {
            String html = new String(responseBody, "UTF-8");
            return Jsoup.parse(html);
        }
    };

    public void customResolver() {
        CUrl curl = new CUrl("http://httpbin.org/html");
        Document html = curl.exec(htmlResolver, null);
        assertEquals(200, curl.getHttpCode());
        assertEquals("Herman Melville - Moby-Dick", html.select("h1:first-child").text());
    }
Example 7: Used as a command line tool, Request Content Reference Example 4
java -jar java-curl-1.2.0.jar https://httpbin.org/get ^
    -x 127.0.0.1:8888 -k ^
    -A "Mozilla/5.0 (Linux; U; Android 8.0.0; zh-cn; KNT-AL10 Build/HUAWEIKNT-AL10) AppleWebKit/537.36 (KHTML, like Gecko) MQQBrowser/7.3 Chrome/37.0.0.0 Mobile Safari/537.36" ^
    -H "Referer: http://somesite.com/fake_referer" ^
    -H "X-Requested-With: XMLHttpRequest" ^
    -H "X-Auth-Token: xxxxxxx"

Keywords: Java curl Mobile Android

Added by t_miller_3 on Fri, 17 May 2019 18:11:18 +0300