un-nest curl

This commit is contained in:
2025-08-16 11:22:44 -04:00
parent 77186c88dd
commit 58cabadc44
5586 changed files with 310 additions and 310 deletions

View File

@@ -0,0 +1,176 @@
<!--
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
SPDX-License-Identifier: curl
-->
# bufq
This is an internal module for managing I/O buffers. A `bufq` can be written
to and read from. It manages read and write positions and has a maximum size.
## read/write
Its basic read/write functions have a similar signature and return code
handling as many internal curl read and write ones.
```
ssize_t Curl_bufq_write(struct bufq *q, const unsigned char *buf, size_t len, CURLcode *err);
- returns the length written into `q` or -1 on error.
- writing to a full `q` returns -1 and set *err to CURLE_AGAIN
ssize_t Curl_bufq_read(struct bufq *q, unsigned char *buf, size_t len, CURLcode *err);
- returns the length read from `q` or -1 on error.
- reading from an empty `q` returns -1 and set *err to CURLE_AGAIN
```
To pass data into a `bufq` without an extra copy, read callbacks can be used.
```
typedef ssize_t Curl_bufq_reader(void *reader_ctx, unsigned char *buf, size_t len,
CURLcode *err);
ssize_t Curl_bufq_slurp(struct bufq *q, Curl_bufq_reader *reader, void *reader_ctx,
CURLcode *err);
```
`Curl_bufq_slurp()` invokes the given `reader` callback, passing it its own
internal buffer memory to write to. It may invoke the `reader` several times,
as long as it has space and while the `reader` always returns the length that
was requested. There are variations of `slurp` that call the `reader` at most
once or only read in a maximum amount of bytes.
The analog mechanism for write out buffer data is:
```
typedef ssize_t Curl_bufq_writer(void *writer_ctx, const unsigned char *buf, size_t len,
CURLcode *err);
ssize_t Curl_bufq_pass(struct bufq *q, Curl_bufq_writer *writer, void *writer_ctx,
CURLcode *err);
```
`Curl_bufq_pass()` invokes the `writer`, passing its internal memory and
remove the amount that `writer` reports.
## peek and skip
It is possible to get access to the memory of data stored in a `bufq` with:
```
bool Curl_bufq_peek(const struct bufq *q, const unsigned char **pbuf, size_t *plen);
```
On returning TRUE, `pbuf` points to internal memory with `plen` bytes that one
may read. This is only valid until another operation on `bufq` is performed.
Instead of reading `bufq` data, one may simply skip it:
```
void Curl_bufq_skip(struct bufq *q, size_t amount);
```
This removes `amount` number of bytes from the `bufq`.
## lifetime
`bufq` is initialized and freed similar to the `dynbuf` module. Code using
`bufq` holds a `struct bufq` somewhere. Before it uses it, it invokes:
```
void Curl_bufq_init(struct bufq *q, size_t chunk_size, size_t max_chunks);
```
The `bufq` is told how many "chunks" of data it shall hold at maximum and how
large those "chunks" should be. There are some variants of this, allowing for
more options. How "chunks" are handled in a `bufq` is presented in the section
about memory management.
The user of the `bufq` has the responsibility to call:
```
void Curl_bufq_free(struct bufq *q);
```
to free all resources held by `q`. It is possible to reset a `bufq` to empty via:
```
void Curl_bufq_reset(struct bufq *q);
```
## memory management
Internally, a `bufq` uses allocation of fixed size, e.g. the "chunk_size", up
to a maximum number, e.g. "max_chunks". These chunks are allocated on demand,
therefore writing to a `bufq` may return `CURLE_OUT_OF_MEMORY`. Once the max
number of chunks are used, the `bufq` reports that it is "full".
Each chunks has a `read` and `write` index. A `bufq` keeps its chunks in a
list. Reading happens always at the head chunk, writing always goes to the
tail chunk. When the head chunk becomes empty, it is removed. When the tail
chunk becomes full, another chunk is added to the end of the list, becoming
the new tail.
Chunks that are no longer used are returned to a `spare` list by default. If
the `bufq` is created with option `BUFQ_OPT_NO_SPARES` those chunks are freed
right away.
If a `bufq` is created with a `bufc_pool`, the no longer used chunks are
returned to the pool. Also `bufq` asks the pool for a chunk when it needs one.
More in section "pools".
## empty, full and overflow
One can ask about the state of a `bufq` with methods such as
`Curl_bufq_is_empty(q)`, `Curl_bufq_is_full(q)`, etc. The amount of data held
by a `bufq` is the sum of the data in all its chunks. This is what is reported
by `Curl_bufq_len(q)`.
Note that a `bufq` length and it being "full" are only loosely related. A
simple example:
* create a `bufq` with chunk_size=1000 and max_chunks=4.
* write 4000 bytes to it, it reports "full"
* read 1 bytes from it, it still reports "full"
* read 999 more bytes from it, and it is no longer "full"
The reason for this is that full really means: *bufq uses max_chunks and the
last one cannot be written to*.
When you read 1 byte from the head chunk in the example above, the head still
hold 999 unread bytes. Only when those are also read, can the head chunk be
removed and a new tail be added.
There is another variation to this. If you initialized a `bufq` with option
`BUFQ_OPT_SOFT_LIMIT`, it allows writes **beyond** the `max_chunks`. It
reports **full**, but one can **still** write. This option is necessary, if
partial writes need to be avoided. It means that you need other checks to keep
the `bufq` from growing ever larger and larger.
## pools
A `struct bufc_pool` may be used to create chunks for a `bufq` and keep spare
ones around. It is initialized and used via:
```
void Curl_bufcp_init(struct bufc_pool *pool, size_t chunk_size, size_t spare_max);
void Curl_bufq_initp(struct bufq *q, struct bufc_pool *pool, size_t max_chunks, int opts);
```
The pool gets the size and the mount of spares to keep. The `bufq` gets the
pool and the `max_chunks`. It no longer needs to know the chunk sizes, as
those are managed by the pool.
A pool can be shared between many `bufq`s, as long as all of them operate in
the same thread. In curl that would be true for all transfers using the same
multi handle. The advantages of a pool are:
* when all `bufq`s are empty, only memory for `max_spare` chunks in the pool
is used. Empty `bufq`s holds no memory.
* the latest spare chunk is the first to be handed out again, no matter which
`bufq` needs it. This keeps the footprint of "recently used" memory smaller.

View File

@@ -0,0 +1,86 @@
<!--
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
SPDX-License-Identifier: curl
-->
# bufref
This is an internal module for handling buffer references. A referenced
buffer is associated with its destructor function that is implicitly called
when the reference is invalidated. Once referenced, a buffer cannot be
reallocated.
A data length is stored within the reference for binary data handling
purposes; it is not used by the bufref API.
The `struct bufref` is used to hold data referencing a buffer. The members of
that structure **MUST NOT** be accessed or modified without using the dedicated
bufref API.
## `init`
```c
void Curl_bufref_init(struct bufref *br);
```
Initializes a `bufref` structure. This function **MUST** be called before any
other operation is performed on the structure.
Upon completion, the referenced buffer is `NULL` and length is zero.
This function may also be called to bypass referenced buffer destruction while
invalidating the current reference.
## `free`
```c
void Curl_bufref_free(struct bufref *br);
```
Destroys the previously referenced buffer using its destructor and
reinitializes the structure for a possible subsequent reuse.
## `set`
```c
void Curl_bufref_set(struct bufref *br, const void *buffer, size_t length,
void (*destructor)(void *));
```
Releases the previously referenced buffer, then assigns the new `buffer` to
the structure, associated with its `destructor` function. The latter can be
specified as `NULL`: this is the case when the referenced buffer is static.
if `buffer` is NULL, `length` must be zero.
## `memdup`
```c
CURLcode Curl_bufref_memdup(struct bufref *br, const void *data, size_t length);
```
Releases the previously referenced buffer, then duplicates the `length`-byte
`data` into a buffer allocated via `malloc()` and references the latter
associated with destructor `curl_free()`.
An additional trailing byte is allocated and set to zero as a possible string
null-terminator; it is not counted in the stored length.
Returns `CURLE_OK` if successful, else `CURLE_OUT_OF_MEMORY`.
## `ptr`
```c
const unsigned char *Curl_bufref_ptr(const struct bufref *br);
```
Returns a `const unsigned char *` to the referenced buffer.
## `len`
```c
size_t Curl_bufref_len(const struct bufref *br);
```
Returns the stored length of the referenced buffer.

View File

@@ -0,0 +1,190 @@
<!--
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
SPDX-License-Identifier: curl
-->
# checksrc
This is the tool we use within the curl project to scan C source code and
check that it adheres to our [Source Code Style guide](CODE_STYLE.md).
## Usage
checksrc.pl [options] [file1] [file2] ...
## Command line options
`-W[file]` skip that file and exclude it from being checked. Helpful
when, for example, one of the files is generated.
`-D[dir]` directory name to prepend to filenames when accessing them.
`-h` shows the help output, that also lists all recognized warnings
## What does `checksrc` warn for?
`checksrc` does not check and verify the code against the entire style guide.
The script is an effort to detect the most common mistakes and syntax mistakes
that contributors make before they get accustomed to our code style. Heck,
many of us regulars do the mistakes too and this script helps us keep the code
in shape.
checksrc.pl -h
Lists how to use the script and it lists all existing warnings it has and
problems it detects. At the time of this writing, the existing `checksrc`
warnings are:
- `ASSIGNWITHINCONDITION`: Assignment within a conditional expression. The
code style mandates the assignment to be done outside of it.
- `ASTERISKNOSPACE`: A pointer was declared like `char* name` instead of the
more appropriate `char *name` style. The asterisk should sit next to the
name.
- `ASTERISKSPACE`: A pointer was declared like `char * name` instead of the
more appropriate `char *name` style. The asterisk should sit right next to
the name without a space in between.
- `BADCOMMAND`: There is a bad `checksrc` instruction in the code. See the
**Ignore certain warnings** section below for details.
- `BANNEDFUNC`: A banned function was used. The functions sprintf, vsprintf,
strcat, strncat, gets are **never** allowed in curl source code.
- `BRACEELSE`: '} else' on the same line. The else is supposed to be on the
following line.
- `BRACEPOS`: wrong position for an open brace (`{`).
- `BRACEWHILE`: more than once space between end brace and while keyword
- `COMMANOSPACE`: a comma without following space
- `COPYRIGHT`: the file is missing a copyright statement
- `CPPCOMMENTS`: `//` comment detected, that is not C89 compliant
- `DOBRACE`: only use one space after do before open brace
- `EMPTYLINEBRACE`: found empty line before open brace
- `EQUALSNOSPACE`: no space after `=` sign
- `EQUALSNULL`: comparison with `== NULL` used in if/while. We use `!var`.
- `EXCLAMATIONSPACE`: space found after exclamations mark
- `FOPENMODE`: `fopen()` needs a macro for the mode string, use it
- `INDENTATION`: detected a wrong start column for code. Note that this
warning only checks some specific places and can certainly miss many bad
indentations.
- `LONGLINE`: A line is longer than 79 columns.
- `MULTISPACE`: Multiple spaces were found where only one should be used.
- `NOSPACEEQUALS`: An equals sign was found without preceding space. We prefer
`a = 2` and *not* `a=2`.
- `NOTEQUALSZERO`: check found using `!= 0`. We use plain `if(var)`.
- `ONELINECONDITION`: do not put the conditional block on the same line as `if()`
- `OPENCOMMENT`: File ended with a comment (`/*`) still "open".
- `PARENBRACE`: `){` was used without sufficient space in between.
- `RETURNNOSPACE`: `return` was used without space between the keyword and the
following value.
- `SEMINOSPACE`: There was no space (or newline) following a semicolon.
- `SIZEOFNOPAREN`: Found use of sizeof without parentheses. We prefer
`sizeof(int)` style.
- `SNPRINTF` - Found use of `snprintf()`. Since we use an internal replacement
with a different return code etc, we prefer `msnprintf()`.
- `SPACEAFTERPAREN`: there was a space after open parenthesis, `( text`.
- `SPACEBEFORECLOSE`: there was a space before a close parenthesis, `text )`.
- `SPACEBEFORECOMMA`: there was a space before a comma, `one , two`.
- `SPACEBEFOREPAREN`: there was a space before an open parenthesis, `if (`,
where one was not expected
- `SPACESEMICOLON`: there was a space before semicolon, ` ;`.
- `TABS`: TAB characters are not allowed
- `TRAILINGSPACE`: Trailing whitespace on the line
- `TYPEDEFSTRUCT`: we frown upon (most) typedefed structs
- `UNUSEDIGNORE`: a `checksrc` inlined warning ignore was asked for but not
used, that is an ignore that should be removed or changed to get used.
### Extended warnings
Some warnings are quite computationally expensive to perform, so they are
turned off by default. To enable these warnings, place a `.checksrc` file in
the directory where they should be activated with commands to enable the
warnings you are interested in. The format of the file is to enable one
warning per line like so: `enable <EXTENDEDWARNING>`
Currently these are the extended warnings which can be enabled:
- `COPYRIGHTYEAR`: the current changeset has not updated the copyright year in
the source file
- `STRERROR`: use of banned function strerror()
- `STDERR`: use of banned variable `stderr`
## Ignore certain warnings
Due to the nature of the source code and the flaws of the `checksrc` tool,
there is sometimes a need to ignore specific warnings. `checksrc` allows a few
different ways to do this.
### Inline ignore
You can control what to ignore within a specific source file by providing
instructions to `checksrc` in the source code itself. See examples below. The
instruction can ask to ignore a specific warning a specific number of times or
you ignore all of them until you mark the end of the ignored section.
Inline ignores are only done for that single specific source code file.
Example
/* !checksrc! disable LONGLINE all */
This ignores the warning for overly long lines until it is re-enabled with:
/* !checksrc! enable LONGLINE */
If the enabling is not performed before the end of the file, it is enabled
again automatically for the next file.
You can also opt to ignore just N violations so that if you have a single long
line you just cannot shorten and is agreed to be fine anyway:
/* !checksrc! disable LONGLINE 1 */
... and the warning for long lines is enabled again automatically after it has
ignored that single warning. The number `1` can of course be changed to any
other integer number. It can be used to make sure only the exact intended
instances are ignored and nothing extra.
### Directory wide ignore patterns
This is a method we have transitioned away from. Use inline ignores as far as
possible.
Make a `checksrc.skip` file in the directory of the source code with the
false positive, and include the full offending line into this file.

View File

@@ -0,0 +1,135 @@
<!--
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
SPDX-License-Identifier: curl
-->
# curl client readers
Client readers is a design in the internals of libcurl, not visible in its public API. They were started
in curl v8.7.0. This document describes the concepts, its high level implementation and the motivations.
## Naming
`libcurl` operates between clients and servers. A *client* is the application using libcurl, like the command line tool `curl` itself. Data to be uploaded to a server is **read** from the client and **sent** to the server, the servers response is **received** by `libcurl` and then **written** to the client.
With this naming established, client readers are concerned with providing data from the application to the server. Applications register callbacks via `CURLOPT_READFUNCTION`, data via `CURLOPT_POSTFIELDS` and other options to be used by `libcurl` when the request is send.
## Invoking
The transfer loop that sends and receives, is using `Curl_client_read()` to get more data to send for a transfer. If no specific reader has been installed yet, the default one that uses `CURLOPT_READFUNCTION` is added. The prototype is
```
CURLcode Curl_client_read(struct Curl_easy *data, char *buf, size_t blen,
size_t *nread, bool *eos);
```
The arguments are the transfer to read for, a buffer to hold the read data, its length, the actual number of bytes placed into the buffer and the `eos` (*end of stream*) flag indicating that no more data is available. The `eos` flag may be set for a read amount, if that amount was the last. That way curl can avoid to read an additional time.
The implementation of `Curl_client_read()` uses a chain of *client reader* instances to get the data. This is similar to the design of *client writers*. The chain of readers allows processing of the data to send.
The definition of a reader is:
```
struct Curl_crtype {
const char *name; /* writer name. */
CURLcode (*do_init)(struct Curl_easy *data, struct Curl_creader *writer);
CURLcode (*do_read)(struct Curl_easy *data, struct Curl_creader *reader,
char *buf, size_t blen, size_t *nread, bool *eos);
void (*do_close)(struct Curl_easy *data, struct Curl_creader *reader);
bool (*needs_rewind)(struct Curl_easy *data, struct Curl_creader *reader);
curl_off_t (*total_length)(struct Curl_easy *data,
struct Curl_creader *reader);
CURLcode (*resume_from)(struct Curl_easy *data,
struct Curl_creader *reader, curl_off_t offset);
CURLcode (*rewind)(struct Curl_easy *data, struct Curl_creader *reader);
};
struct Curl_creader {
const struct Curl_crtype *crt; /* type implementation */
struct Curl_creader *next; /* Downstream reader. */
Curl_creader_phase phase; /* phase at which it operates */
};
```
`Curl_creader` is a reader instance with a `next` pointer to form the chain. It as a type `crt` which provides the implementation. The main callback is `do_read()` which provides the data to the caller. The others are for setup and tear down. `needs_rewind()` is explained further below.
## Phases and Ordering
Since client readers may transform the data being read through the chain, the order in which they are called is relevant for the outcome. When a reader is created, it gets the `phase` property in which it operates. Reader phases are defined like:
```
typedef enum {
CURL_CR_NET, /* data send to the network (connection filters) */
CURL_CR_TRANSFER_ENCODE, /* add transfer-encodings */
CURL_CR_PROTOCOL, /* before transfer, but after content decoding */
CURL_CR_CONTENT_ENCODE, /* add content-encodings */
CURL_CR_CLIENT /* data read from client */
} Curl_creader_phase;
```
If a reader for phase `PROTOCOL` is added to the chain, it is always added *after* any `NET` or `TRANSFER_ENCODE` readers and *before* and `CONTENT_ENCODE` and `CLIENT` readers. If there is already a reader for the same phase, the new reader is added before the existing one(s).
### Example: `chunked` reader
In `http_chunks.c` a client reader for chunked uploads is implemented. This one operates at phase `CURL_CR_TRANSFER_ENCODE`. Any data coming from the reader "below" has the HTTP/1.1 chunk handling applied and returned to the caller.
When this reader sees an `eos` from below, it generates the terminal chunk, adding trailers if provided by the application. When that last chunk is fully returned, it also sets `eos` to the caller.
### Example: `lineconv` reader
In `sendf.c` a client reader that does line-end conversions is implemented. It operates at `CURL_CR_CONTENT_ENCODE` and converts any "\n" to "\r\n". This is used for FTP ASCII uploads or when the general `crlf` options has been set.
### Example: `null` reader
Implemented in `sendf.c` for phase `CURL_CR_CLIENT`, this reader has the simple job of providing transfer bytes of length 0 to the caller, immediately indicating an `eos`. This reader is installed by HTTP for all GET/HEAD requests and when authentication is being negotiated.
### Example: `buf` reader
Implemented in `sendf.c` for phase `CURL_CR_CLIENT`, this reader get a buffer pointer and a length and provides exactly these bytes. This one is used in HTTP for sending `postfields` provided by the application.
## Request retries
Sometimes it is necessary to send a request with client data again. Transfer handling can inquire via `Curl_client_read_needs_rewind()` if a rewind (e.g. a reset of the client data) is necessary. This asks all installed readers if they need it and give `FALSE` of none does.
## Upload Size
Many protocols need to know the amount of bytes delivered by the client readers in advance. They may invoke `Curl_creader_total_length(data)` to retrieve that. However, not all reader chains know the exact value beforehand. In that case, the call returns `-1` for "unknown".
Even if the length of the "raw" data is known, the length that is send may not. Example: with option `--crlf` the uploaded content undergoes line-end conversion. The line converting reader does not know in advance how many newlines it may encounter. Therefore it must return `-1` for any positive raw content length.
In HTTP, once the correct client readers are installed, the protocol asks the readers for the total length. If that is known, it can set `Content-Length:` accordingly. If not, it may choose to add an HTTP "chunked" reader.
In addition, there is `Curl_creader_client_length(data)` which gives the total length as reported by the reader in phase `CURL_CR_CLIENT` without asking other readers that may transform the raw data. This is useful in estimating the size of an upload. The HTTP protocol uses this to determine if `Expect: 100-continue` shall be done.
## Resuming
Uploads can start at a specific offset, if so requested. The "resume from" that offset. This applies to the reader in phase `CURL_CR_CLIENT` that delivers the "raw" content. Resumption can fail if the installed reader does not support it or if the offset is too large.
The total length reported by the reader changes when resuming. Example: resuming an upload of 100 bytes by 25 reports a total length of 75 afterwards.
If `resume_from()` is invoked twice, it is additive. There is currently no way to undo a resume.
## Rewinding
When a request is retried, installed client readers are discarded and replaced by new ones. This works only if the new readers upload the same data. For many readers, this is not an issue. The "null" reader always does the same. Also the `buf` reader, initialized with the same buffer, does this.
Readers operating on callbacks to the application need to "rewind" the underlying content. For example, when reading from a `FILE*`, the reader needs to `fseek()` to the beginning. The following methods are used:
1. `Curl_creader_needs_rewind(data)`: tells if a rewind is necessary, given the current state of the reader chain. If nothing really has been read so far, this returns `FALSE`.
2. `Curl_creader_will_rewind(data)`: tells if the reader chain rewinds at the start of the next request.
3. `Curl_creader_set_rewind(data, TRUE)`: marks the reader chain for rewinding at the start of the next request.
4. `Curl_client_start(data)`: tells the readers that a new request starts and they need to rewind if requested.
## Summary and Outlook
By adding the client reader interface, any protocol can control how/if it wants the curl transfer to send bytes for a request. The transfer loop becomes then blissfully ignorant of the specifics.
The protocols on the other hand no longer have to care to package data most efficiently. At any time, should more data be needed, it can be read from the client. This is used when sending HTTP requests headers to add as much request body data to the initial sending as there is room for.
Future enhancements based on the client readers:
* `expect-100` handling: place that into an HTTP specific reader at
`CURL_CR_PROTOCOL` and eliminate the checks in the generic transfer parts.
* `eos forwarding`: transfer should forward an `eos` flag to the connection
filters. Filters like HTTP/2 and HTTP/3 can make use of that, terminating
streams early. This would also eliminate length checks in stream handling.

View File

@@ -0,0 +1,123 @@
<!--
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
SPDX-License-Identifier: curl
-->
# curl client writers
Client writers is a design in the internals of libcurl, not visible in its public API. They were started
in curl v8.5.0. This document describes the concepts, its high level implementation and the motivations.
## Naming
`libcurl` operates between clients and servers. A *client* is the application using libcurl, like the command line tool `curl` itself. Data to be uploaded to a server is **read** from the client and **send** to the server, the servers response is **received** by `libcurl` and then **written** to the client.
With this naming established, client writers are concerned with writing responses from the server to the application. Applications register callbacks via `CURLOPT_WRITEFUNCTION` and `CURLOPT_HEADERFUNCTION` to be invoked by `libcurl` when the response is received.
## Invoking
All code in `libcurl` that handles response data is ultimately expected to forward this data via `Curl_client_write()` to the application. The exact prototype of this function is:
```
CURLcode Curl_client_write(struct Curl_easy *data, int type, const char *buf, size_t blen);
```
The `type` argument specifies what the bytes in `buf` actually are. The following bits are defined:
```
#define CLIENTWRITE_BODY (1<<0) /* non-meta information, BODY */
#define CLIENTWRITE_INFO (1<<1) /* meta information, not a HEADER */
#define CLIENTWRITE_HEADER (1<<2) /* meta information, HEADER */
#define CLIENTWRITE_STATUS (1<<3) /* a special status HEADER */
#define CLIENTWRITE_CONNECT (1<<4) /* a CONNECT related HEADER */
#define CLIENTWRITE_1XX (1<<5) /* a 1xx response related HEADER */
#define CLIENTWRITE_TRAILER (1<<6) /* a trailer HEADER */
```
The main types here are `CLIENTWRITE_BODY` and `CLIENTWRITE_HEADER`. They are
mutually exclusive. The other bits are enhancements to `CLIENTWRITE_HEADER` to
specify what the header is about. They are only used in HTTP and related
protocols (RTSP and WebSocket).
The implementation of `Curl_client_write()` uses a chain of *client writer* instances to process the call and make sure that the bytes reach the proper application callbacks. This is similar to the design of connection filters: client writers can be chained to process the bytes written through them. The definition is:
```
struct Curl_cwtype {
const char *name;
CURLcode (*do_init)(struct Curl_easy *data,
struct Curl_cwriter *writer);
CURLcode (*do_write)(struct Curl_easy *data,
struct Curl_cwriter *writer, int type,
const char *buf, size_t nbytes);
void (*do_close)(struct Curl_easy *data,
struct Curl_cwriter *writer);
};
struct Curl_cwriter {
const struct Curl_cwtype *cwt; /* type implementation */
struct Curl_cwriter *next; /* Downstream writer. */
Curl_cwriter_phase phase; /* phase at which it operates */
};
```
`Curl_cwriter` is a writer instance with a `next` pointer to form the chain. It has a type `cwt` which provides the implementation. The main callback is `do_write()` that processes the data and calls then the `next` writer. The others are for setup and tear down.
## Phases and Ordering
Since client writers may transform the bytes written through them, the order in which the are called is relevant for the outcome. When a writer is created, one property it gets is the `phase` in which it operates. Writer phases are defined like:
```
typedef enum {
CURL_CW_RAW, /* raw data written, before any decoding */
CURL_CW_TRANSFER_DECODE, /* remove transfer-encodings */
CURL_CW_PROTOCOL, /* after transfer, but before content decoding */
CURL_CW_CONTENT_DECODE, /* remove content-encodings */
CURL_CW_CLIENT /* data written to client */
} Curl_cwriter_phase;
```
If a writer for phase `PROTOCOL` is added to the chain, it is always added *after* any `RAW` or `TRANSFER_DECODE` and *before* any `CONTENT_DECODE` and `CLIENT` phase writer. If there is already a writer for the same phase present, the new writer is inserted just before that one.
All transfers have a chain of 3 writers by default. A specific protocol handler may alter that by adding additional writers. The 3 standard writers are (name, phase):
1. `"raw", CURL_CW_RAW `: if the transfer is verbose, it forwards the body data to the debug function.
1. `"download", CURL_CW_PROTOCOL`: checks that protocol limits are kept and updates progress counters. When a download has a known length, it checks that it is not exceeded and errors otherwise.
1. `"client", CURL_CW_CLIENT`: the main work horse. It invokes the application callbacks or writes to the configured file handles. It chops large writes into smaller parts, as documented for `CURLOPT_WRITEFUNCTION`. If also handles *pausing* of transfers when the application callback returns `CURL_WRITEFUNC_PAUSE`.
With these writers always in place, libcurl's protocol handlers automatically have these implemented.
## Enhanced Use
HTTP is the protocol in curl that makes use of the client writer chain by
adding writers to it. When the `libcurl` application set
`CURLOPT_ACCEPT_ENCODING` (as `curl` does with `--compressed`), the server is
offered an `Accept-Encoding` header with the algorithms supported. The server
then may choose to send the response body compressed. For example using `gzip`
or `brotli` or even both.
In the server's response, if there is a `Content-Encoding` header listing the
encoding applied. If supported by `libcurl` it then decompresses the content
before writing it out to the client. How does it do that?
The HTTP protocol adds client writers in phase `CURL_CW_CONTENT_DECODE` on
seeing such a header. For each encoding listed, it adds the corresponding
writer. The response from the server is then passed through
`Curl_client_write()` to the writers that decode it. If several encodings had
been applied the writer chain decodes them in the proper order.
When the server provides a `Content-Length` header, that value applies to the
*compressed* content. Length checks on the response bytes must happen *before*
it gets decoded. That is why this check happens in phase `CURL_CW_PROTOCOL`
which always is ordered before writers in phase `CURL_CW_CONTENT_DECODE`.
What else?
Well, HTTP servers may also apply a `Transfer-Encoding` to the body of a response. The most well-known one is `chunked`, but algorithms like `gzip` and friends could also be applied. The difference to content encodings is that decoding needs to happen *before* protocol checks, for example on length, are done.
That is why transfer decoding writers are added for phase `CURL_CW_TRANSFER_DECODE`. Which makes their operation happen *before* phase `CURL_CW_PROTOCOL` where length may be checked.
## Summary
By adding the common behavior of all protocols into `Curl_client_write()` we make sure that they do apply everywhere. Protocol handler have less to worry about. Changes to default behavior can be done without affecting handler implementations.
Having a writer chain as implementation allows protocol handlers with extra needs, like HTTP, to add to this for special behavior. The common way of writing the actual response data stays the same.

View File

@@ -0,0 +1,363 @@
<!--
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
SPDX-License-Identifier: curl
-->
# curl C code style
Source code that has a common style is easier to read than code that uses
different styles in different places. It helps making the code feel like one
single code base. Easy-to-read is an important property of code and helps
making it easier to review when new things are added and it helps debugging
code when developers are trying to figure out why things go wrong. A unified
style is more important than individual contributors having their own personal
tastes satisfied.
Our C code has a few style rules. Most of them are verified and upheld by the
`scripts/checksrc.pl` script. Invoked with `make checksrc` or even by default
by the build system when built after `./configure --enable-debug` has been
used.
It is normally not a problem for anyone to follow the guidelines, as you just
need to copy the style already used in the source code and there are no
particularly unusual rules in our set of rules.
We also work hard on writing code that are warning-free on all the major
platforms and in general on as many platforms as possible. Code that obviously
causes warnings is not accepted as-is.
## Readability
A primary characteristic for code is readability. The intent and meaning of
the code should be visible to the reader. Being clear and unambiguous beats
being clever and saving two lines of code. Write simple code. You and others
who come back to this code over the coming decades want to be able to quickly
understand it when debugging.
## Naming
Try using a non-confusing naming scheme for your new functions and variable
names. It does not necessarily have to mean that you should use the same as in
other places of the code, just that the names should be logical,
understandable and be named according to what they are used for. File-local
functions should be made static. We like lower case names.
See the [INTERNALS](https://curl.se/dev/internals.html#symbols) document on
how we name non-exported library-global symbols.
## Indenting
We use only spaces for indentation, never TABs. We use two spaces for each new
open brace.
```c
if(something_is_true) {
while(second_statement == fine) {
moo();
}
}
```
## Comments
Since we write C89 code, **//** comments are not allowed. They were not
introduced in the C standard until C99. We use only __/* comments */__.
```c
/* this is a comment */
```
## Long lines
Source code in curl may never be wider than 79 columns and there are two
reasons for maintaining this even in the modern era of large and high
resolution screens:
1. Narrower columns are easier to read than wide ones. There is a reason
newspapers have used columns for decades or centuries.
2. Narrower columns allow developers to easier show multiple pieces of code
next to each other in different windows. It allows two or three source
code windows next to each other on the same screen - as well as multiple
terminal and debugging windows.
## Braces
In if/while/do/for expressions, we write the open brace on the same line as
the keyword and we then set the closing brace on the same indentation level as
the initial keyword. Like this:
```c
if(age < 40) {
/* clearly a youngster */
}
```
You may omit the braces if they would contain only a one-line statement:
```c
if(!x)
continue;
```
For functions the opening brace should be on a separate line:
```c
int main(int argc, char **argv)
{
return 1;
}
```
## 'else' on the following line
When adding an **else** clause to a conditional expression using braces, we
add it on a new line after the closing brace. Like this:
```c
if(age < 40) {
/* clearly a youngster */
}
else {
/* probably grumpy */
}
```
## No space before parentheses
When writing expressions using if/while/do/for, there shall be no space
between the keyword and the open parenthesis. Like this:
```c
while(1) {
/* loop forever */
}
```
## Use boolean conditions
Rather than test a conditional value such as a bool against TRUE or FALSE, a
pointer against NULL or != NULL and an int against zero or not zero in
if/while conditions we prefer:
```c
result = do_something();
if(!result) {
/* something went wrong */
return result;
}
```
## No assignments in conditions
To increase readability and reduce complexity of conditionals, we avoid
assigning variables within if/while conditions. We frown upon this style:
```c
if((ptr = malloc(100)) == NULL)
return NULL;
```
and instead we encourage the above version to be spelled out more clearly:
```c
ptr = malloc(100);
if(!ptr)
return NULL;
```
## New block on a new line
We never write multiple statements on the same source line, even for short
if() conditions.
```c
if(a)
return TRUE;
else if(b)
return FALSE;
```
and NEVER:
```c
if(a) return TRUE;
else if(b) return FALSE;
```
## Space around operators
Please use spaces on both sides of operators in C expressions. Postfix **(),
[], ->, ., ++, --** and Unary **+, -, !, ~, &** operators excluded they should
have no space.
Examples:
```c
bla = func();
who = name[0];
age += 1;
true = !false;
size += -2 + 3 * (a + b);
ptr->member = a++;
struct.field = b--;
ptr = &address;
contents = *pointer;
complement = ~bits;
empty = (!*string) ? TRUE : FALSE;
```
## No parentheses for return values
We use the 'return' statement without extra parentheses around the value:
```c
int works(void)
{
return TRUE;
}
```
## Parentheses for sizeof arguments
When using the sizeof operator in code, we prefer it to be written with
parentheses around its argument:
```c
int size = sizeof(int);
```
## Column alignment
Some statements cannot be completed on a single line because the line would be
too long, the statement too hard to read, or due to other style guidelines
above. In such a case the statement spans multiple lines.
If a continuation line is part of an expression or sub-expression then you
should align on the appropriate column so that it is easy to tell what part of
the statement it is. Operators should not start continuation lines. In other
cases follow the 2-space indent guideline. Here are some examples from
libcurl:
```c
if(Curl_pipeline_wanted(handle->multi, CURLPIPE_HTTP1) &&
(handle->set.httpversion != CURL_HTTP_VERSION_1_0) &&
(handle->set.httpreq == HTTPREQ_GET ||
handle->set.httpreq == HTTPREQ_HEAD))
/* did not ask for HTTP/1.0 and a GET or HEAD */
return TRUE;
```
If no parenthesis, use the default indent:
```c
data->set.http_disable_hostname_check_before_authentication =
(0 != va_arg(param, long)) ? TRUE : FALSE;
```
Function invoke with an open parenthesis:
```c
if(option) {
result = parse_login_details(option, strlen(option),
(userp ? &user : NULL),
(passwdp ? &passwd : NULL),
NULL);
}
```
Align with the "current open" parenthesis:
```c
DEBUGF(infof(data, "Curl_pp_readresp_ %d bytes of trailing "
"server response left\n",
(int)clipamount));
```
## Platform dependent code
Use **#ifdef HAVE_FEATURE** to do conditional code. We avoid checking for
particular operating systems or hardware in the #ifdef lines. The HAVE_FEATURE
shall be generated by the configure script for Unix-like systems and they are
hard-coded in the `config-[system].h` files for the others.
We also encourage use of macros/functions that possibly are empty or defined
to constants when libcurl is built without that feature, to make the code
seamless. Like this example where the **magic()** function works differently
depending on a build-time conditional:
```c
#ifdef HAVE_MAGIC
void magic(int a)
{
return a + 2;
}
#else
#define magic(x) 1
#endif
int content = magic(3);
```
## No typedefed structs
Use structs by all means, but do not typedef them. Use the `struct name` way
of identifying them:
```c
struct something {
void *valid;
size_t way_to_write;
};
struct something instance;
```
**Not okay**:
```c
typedef struct {
void *wrong;
size_t way_to_write;
} something;
something instance;
```
## Banned functions
To avoid footguns and unintended consequences we forbid the use of a number of
C functions. The `checksrc` script finds and yells about them if used. This
makes us write better code.
This is the full list of functions generally banned.
_access
_mbscat
_mbsncat
_tcscat
_tcsncat
_waccess
_wcscat
_wcsncat
access
gets
gmtime
LoadLibrary
LoadLibraryA
LoadLibraryEx
LoadLibraryExA
LoadLibraryExW
LoadLibraryW
localtime
snprintf
sprintf
sscanf
strcat
strerror
strncat
strncpy
strtok
strtol
strtoul
vsnprint
vsprintf

View File

@@ -0,0 +1,308 @@
<!--
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
SPDX-License-Identifier: curl
-->
# curl connection filters
Connection filters is a design in the internals of curl, not visible in its
public API. They were added in curl v7.87.0. This document describes the
concepts, its high level implementation and the motivations.
## Filters
A "connection filter" is a piece of code that is responsible for handling a
range of operations of curl's connections: reading, writing, waiting on
external events, connecting and closing down - to name the most important
ones.
The most important feat of connection filters is that they can be stacked on
top of each other (or "chained" if you prefer that metaphor). In the common
scenario that you want to retrieve a `https:` URL with curl, you need 2 basic
things to send the request and get the response: a TCP connection, represented
by a `socket` and an SSL instance en- and decrypt over that socket. You write
your request to the SSL instance, which encrypts and writes that data to the
socket, which then sends the bytes over the network.
With connection filters, curl's internal setup looks something like this (cf
for connection filter):
```
Curl_easy *data connectdata *conn cf-ssl cf-socket
+----------------+ +-----------------+ +-------+ +--------+
|https://curl.se/|----> | properties |----> | keys |---> | socket |--> OS --> network
+----------------+ +-----------------+ +-------+ +--------+
Curl_write(data, buffer)
--> Curl_cfilter_write(data, data->conn, buffer)
---> conn->filter->write(conn->filter, data, buffer)
```
While connection filters all do different things, they look the same from the
"outside". The code in `data` and `conn` does not really know **which**
filters are installed. `conn` just writes into the first filter, whatever that
is.
Same is true for filters. Each filter has a pointer to the `next` filter. When
SSL has encrypted the data, it does not write to a socket, it writes to the
next filter. If that is indeed a socket, or a file, or an HTTP/2 connection is
of no concern to the SSL filter.
This allows stacking, as in:
```
Direct:
http://localhost/ conn -> cf-socket
https://curl.se/ conn -> cf-ssl -> cf-socket
Via http proxy tunnel:
http://localhost/ conn -> cf-http-proxy -> cf-socket
https://curl.se/ conn -> cf-ssl -> cf-http-proxy -> cf-socket
Via https proxy tunnel:
http://localhost/ conn -> cf-http-proxy -> cf-ssl -> cf-socket
https://curl.se/ conn -> cf-ssl -> cf-http-proxy -> cf-ssl -> cf-socket
Via http proxy tunnel via SOCKS proxy:
http://localhost/ conn -> cf-http-proxy -> cf-socks -> cf-socket
```
### Connecting/Closing
Before `Curl_easy` can send the request, the connection needs to be
established. This means that all connection filters have done, whatever they
need to do: waiting for the socket to be connected, doing the TLS handshake,
performing the HTTP tunnel request, etc. This has to be done in reverse order:
the last filter has to do its connect first, then the one above can start,
etc.
Each filter does in principle the following:
```
static CURLcode
myfilter_cf_connect(struct Curl_cfilter *cf,
struct Curl_easy *data,
bool *done)
{
CURLcode result;
if(cf->connected) { /* we and all below are done */
*done = TRUE;
return CURLE_OK;
}
/* Let the filters below connect */
result = cf->next->cft->connect(cf->next, data, blocking, done);
if(result || !*done)
return result; /* below errored/not finished yet */
/* MYFILTER CONNECT THINGS */ /* below connected, do out thing */
*done = cf->connected = TRUE; /* done, remember, return */
return CURLE_OK;
}
```
Closing a connection then works similar. The `conn` tells the first filter to
close. Contrary to connecting, the filter does its own things first, before
telling the next filter to close.
### Efficiency
There are two things curl is concerned about: efficient memory use and fast
transfers.
The memory footprint of a filter is relatively small:
```
struct Curl_cfilter {
const struct Curl_cftype *cft; /* the type providing implementation */
struct Curl_cfilter *next; /* next filter in chain */
void *ctx; /* filter type specific settings */
struct connectdata *conn; /* the connection this filter belongs to */
int sockindex; /* TODO: like to get rid off this */
BIT(connected); /* != 0 iff this filter is connected */
};
```
The filter type `cft` is a singleton, one static struct for each type of
filter. The `ctx` is where a filter holds its specific data. That varies by
filter type. An http-proxy filter keeps the ongoing state of the CONNECT here,
free it after its has been established. The SSL filter keeps the `SSL*` (if
OpenSSL is used) here until the connection is closed. So, this varies.
`conn` is a reference to the connection this filter belongs to, so nothing
extra besides the pointer itself.
Several things, that before were kept in `struct connectdata`, now goes into
the `filter->ctx` *when needed*. So, the memory footprint for connections that
do *not* use an http proxy, or socks, or https is lower.
As to transfer efficiency, writing and reading through a filter comes at near
zero cost *if the filter does not transform the data*. An http proxy or socks
filter, once it is connected, just passes the calls through. Those filters
implementations look like this:
```
ssize_t Curl_cf_def_send(struct Curl_cfilter *cf, struct Curl_easy *data,
const void *buf, size_t len, CURLcode *err)
{
return cf->next->cft->do_send(cf->next, data, buf, len, err);
}
```
The `recv` implementation is equivalent.
## Filter Types
The currently existing filter types (curl 8.5.0) are:
* `TCP`, `UDP`, `UNIX`: filters that operate on a socket, providing raw I/O.
* `SOCKET-ACCEPT`: special TCP socket that has a socket that has been
`accept()`ed in a `listen()`
* `SSL`: filter that applies TLS en-/decryption and handshake. Manages the
underlying TLS backend implementation.
* `HTTP-PROXY`, `H1-PROXY`, `H2-PROXY`: the first manages the connection to an
HTTP proxy server and uses the other depending on which ALPN protocol has
been negotiated.
* `SOCKS-PROXY`: filter for the various SOCKS proxy protocol variations
* `HAPROXY`: filter for the protocol of the same name, providing client IP
information to a server.
* `HTTP/2`: filter for handling multiplexed transfers over an HTTP/2
connection
* `HTTP/3`: filter for handling multiplexed transfers over an HTTP/3+QUIC
connection
* `HAPPY-EYEBALLS`: meta filter that implements IPv4/IPv6 "happy eyeballing".
It creates up to 2 sub-filters that race each other for a connection.
* `SETUP`: meta filter that manages the creation of sub-filter chains for a
specific transport (e.g. TCP or QUIC).
* `HTTPS-CONNECT`: meta filter that races a TCP+TLS and a QUIC connection
against each other to determine if HTTP/1.1, HTTP/2 or HTTP/3 shall be used
for a transfer.
Meta filters are combining other filters for a specific purpose, mostly during
connection establishment. Other filters like `TCP`, `UDP` and `UNIX` are only
to be found at the end of filter chains. SSL filters provide encryption, of
course. Protocol filters change the bytes sent and received.
## Filter Flags
Filter types carry flags that inform what they do. These are (for now):
* `CF_TYPE_IP_CONNECT`: this filter type talks directly to a server. This does
not have to be the server the transfer wants to talk to. For example when a
proxy server is used.
* `CF_TYPE_SSL`: this filter type provides encryption.
* `CF_TYPE_MULTIPLEX`: this filter type can manage multiple transfers in parallel.
Filter types can combine these flags. For example, the HTTP/3 filter types
have `CF_TYPE_IP_CONNECT`, `CF_TYPE_SSL` and `CF_TYPE_MULTIPLEX` set.
Flags are useful to extrapolate properties of a connection. To check if a
connection is encrypted, libcurl inspect the filter chain in place, top down,
for `CF_TYPE_SSL`. If it finds `CF_TYPE_IP_CONNECT` before any `CF_TYPE_SSL`,
the connection is not encrypted.
For example, `conn1` is for a `http:` request using a tunnel through an HTTP/2
`https:` proxy. `conn2` is a `https:` HTTP/2 connection to the same proxy.
`conn3` uses HTTP/3 without proxy. The filter chains would look like this
(simplified):
```
conn1 --> `HTTP-PROXY` --> `H2-PROXY` --> `SSL` --> `TCP`
flags: `IP_CONNECT` `SSL` `IP_CONNECT`
conn2 --> `HTTP/2` --> `SSL` --> `HTTP-PROXY` --> `H2-PROXY` --> `SSL` --> `TCP`
flags: `SSL` `IP_CONNECT` `SSL` `IP_CONNECT`
conn3 --> `HTTP/3`
flags: `SSL|IP_CONNECT`
```
Inspecting the filter chains, `conn1` is seen as unencrypted, since it
contains an `IP_CONNECT` filter before any `SSL`. `conn2` is clearly encrypted
as an `SSL` flagged filter is seen first. `conn3` is also encrypted as the
`SSL` flag is checked before the presence of `IP_CONNECT`.
Similar checks can determine if a connection is multiplexed or not.
## Filter Tracing
Filters may make use of special trace macros like `CURL_TRC_CF(data, cf, msg,
...)`. With `data` being the transfer and `cf` being the filter instance.
These traces are normally not active and their execution is guarded so that
they are cheap to ignore.
Users of `curl` may activate them by adding the name of the filter type to the
`--trace-config` argument. For example, in order to get more detailed tracing
of an HTTP/2 request, invoke curl with:
```
> curl -v --trace-config ids,time,http/2 https://curl.se
```
Which gives you trace output with time information, transfer+connection ids
and details from the `HTTP/2` filter. Filter type names in the trace config
are case insensitive. You may use `all` to enable tracing for all filter
types. When using `libcurl` you may call `curl_global_trace(config_string)` at
the start of your application to enable filter details.
## Meta Filters
Meta filters is a catch-all name for filter types that do not change the
transfer data in any way but provide other important services to curl. In
general, it is possible to do all sorts of silly things with them. One of the
commonly used, important things is "eyeballing".
The `HAPPY-EYEBALLS` filter is involved in the connect phase. Its job is to
try the various IPv4 and IPv6 addresses that are known for a server. If only
one address family is known (or configured), it tries the addresses one after
the other with timeouts calculated from the amount of addresses and the
overall connect timeout.
When more than one address family is to be tried, it splits the address list
into IPv4 and IPv6 and makes parallel attempts. The connection filter chain
looks like this:
```
* create connection for http://curl.se
conn[curl.se] --> SETUP[TCP] --> HAPPY-EYEBALLS --> NULL
* start connect
conn[curl.se] --> SETUP[TCP] --> HAPPY-EYEBALLS --> NULL
- ballerv4 --> TCP[151.101.1.91]:443
- ballerv6 --> TCP[2a04:4e42:c00::347]:443
* v6 answers, connected
conn[curl.se] --> SETUP[TCP] --> HAPPY-EYEBALLS --> TCP[2a04:4e42:c00::347]:443
* transfer
```
The modular design of connection filters and that we can plug them into each other is used to control the parallel attempts. When a `TCP` filter does not connect (in time), it is torn down and another one is created for the next address. This keeps the `TCP` filter simple.
The `HAPPY-EYEBALLS` on the other hand stays focused on its side of the problem. We can use it also to make other type of connection by just giving it another filter type to try to have happy eyeballing for QUIC:
```
* create connection for --http3-only https://curl.se
conn[curl.se] --> SETUP[QUIC] --> HAPPY-EYEBALLS --> NULL
* start connect
conn[curl.se] --> SETUP[QUIC] --> HAPPY-EYEBALLS --> NULL
- ballerv4 --> HTTP/3[151.101.1.91]:443
- ballerv6 --> HTTP/3[2a04:4e42:c00::347]:443
* v6 answers, connected
conn[curl.se] --> SETUP[QUIC] --> HAPPY-EYEBALLS --> HTTP/3[2a04:4e42:c00::347]:443
* transfer
```
When we plug these two variants together, we get the `HTTPS-CONNECT` filter
type that is used for `--http3` when **both** HTTP/3 and HTTP/2 or HTTP/1.1
shall be attempted:
```
* create connection for --http3 https://curl.se
conn[curl.se] --> HTTPS-CONNECT --> NULL
* start connect
conn[curl.se] --> HTTPS-CONNECT --> NULL
- SETUP[QUIC] --> HAPPY-EYEBALLS --> NULL
- ballerv4 --> HTTP/3[151.101.1.91]:443
- ballerv6 --> HTTP/3[2a04:4e42:c00::347]:443
- SETUP[TCP] --> HAPPY-EYEBALLS --> NULL
- ballerv4 --> TCP[151.101.1.91]:443
- ballerv6 --> TCP[2a04:4e42:c00::347]:443
* v4 QUIC answers, connected
conn[curl.se] --> HTTPS-CONNECT --> SETUP[QUIC] --> HAPPY-EYEBALLS --> HTTP/3[151.101.1.91]:443
* transfer
```

View File

@@ -0,0 +1,24 @@
<!--
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
SPDX-License-Identifier: curl
-->
# `curlx`
Functions that are prefixed with `curlx_` are internal global functions that
are written in a way to allow them to be "borrowed" and used outside of the
library: in the curl tool and in the curl test suite.
The `curlx` functions are not part of the libcurl API, but are stand-alone
functions whose sources can be built and used outside of libcurl. There are
not API or ABI guarantees. The functions are not written or meant to be used
outside of the curl project.
Only functions actually used by the library are provided here.
## Ways to success
- Do not use `struct Curl_easy` in these files
- Do not use the printf defines in these files
- Make them as stand-alone as possible

View File

@@ -0,0 +1,145 @@
<!--
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
SPDX-License-Identifier: curl
-->
# dynbuf
This is the internal module for creating and handling "dynamic buffers". This
means buffers that can be appended to, dynamically and grow to adapt.
There is always a terminating zero put at the end of the dynamic buffer.
The `struct dynbuf` is used to hold data for each instance of a dynamic
buffer. The members of that struct **MUST NOT** be accessed or modified
without using the dedicated dynbuf API.
## `curlx_dyn_init`
```c
void curlx_dyn_init(struct dynbuf *s, size_t toobig);
```
This initializes a struct to use for dynbuf and it cannot fail. The `toobig`
value **must** be set to the maximum size we allow this buffer instance to
grow to. The functions below return `CURLE_OUT_OF_MEMORY` when hitting this
limit.
## `curlx_dyn_free`
```c
void curlx_dyn_free(struct dynbuf *s);
```
Free the associated memory and clean up. After a free, the `dynbuf` struct can
be reused to start appending new data to.
## `curlx_dyn_addn`
```c
CURLcode curlx_dyn_addn(struct dynbuf *s, const void *mem, size_t len);
```
Append arbitrary data of a given length to the end of the buffer.
If this function fails it calls `curlx_dyn_free` on `dynbuf`.
## `curlx_dyn_add`
```c
CURLcode curlx_dyn_add(struct dynbuf *s, const char *str);
```
Append a C string to the end of the buffer.
If this function fails it calls `curlx_dyn_free` on `dynbuf`.
## `curlx_dyn_addf`
```c
CURLcode curlx_dyn_addf(struct dynbuf *s, const char *fmt, ...);
```
Append a `printf()`-style string to the end of the buffer.
If this function fails it calls `curlx_dyn_free` on `dynbuf`.
## `curlx_dyn_vaddf`
```c
CURLcode curlx_dyn_vaddf(struct dynbuf *s, const char *fmt, va_list ap);
```
Append a `vprintf()`-style string to the end of the buffer.
If this function fails it calls `curlx_dyn_free` on `dynbuf`.
## `curlx_dyn_reset`
```c
void curlx_dyn_reset(struct dynbuf *s);
```
Reset the buffer length, but leave the allocation.
## `curlx_dyn_tail`
```c
CURLcode curlx_dyn_tail(struct dynbuf *s, size_t length);
```
Keep `length` bytes of the buffer tail (the last `length` bytes of the
buffer). The rest of the buffer is dropped. The specified `length` must not be
larger than the buffer length. To instead keep the leading part, see
`curlx_dyn_setlen()`.
## `curlx_dyn_ptr`
```c
char *curlx_dyn_ptr(const struct dynbuf *s);
```
Returns a `char *` to the buffer if it has a length, otherwise may return
NULL. Since the buffer may be reallocated, this pointer should not be trusted
or used anymore after the next buffer manipulation call.
## `curlx_dyn_uptr`
```c
unsigned char *curlx_dyn_uptr(const struct dynbuf *s);
```
Returns an `unsigned char *` to the buffer if it has a length, otherwise may
return NULL. Since the buffer may be reallocated, this pointer should not be
trusted or used anymore after the next buffer manipulation call.
## `curlx_dyn_len`
```c
size_t curlx_dyn_len(const struct dynbuf *s);
```
Returns the length of the buffer in bytes. Does not include the terminating
zero byte.
## `curlx_dyn_setlen`
```c
CURLcode curlx_dyn_setlen(struct dynbuf *s, size_t len);
```
Sets the new shorter length of the buffer in number of bytes. Keeps the
leftmost set number of bytes, discards the rest. To instead keep the tail part
of the buffer, see `curlx_dyn_tail()`.
## `curlx_dyn_take`
```c
char *curlx_dyn_take(struct dynbuf *s, size_t *plen);
```
Transfers ownership of the internal buffer to the caller. The dynbuf
resets to its initial state. The returned pointer may be `NULL` if the
dynbuf never allocated memory. The returned length is the amount of
data written to the buffer. The actual allocated memory might be larger.

View File

@@ -0,0 +1,151 @@
<!--
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
SPDX-License-Identifier: curl
-->
# `hash`
#include "hash.h"
This is the internal module for doing hash tables. A hash table uses a hash
function to compute an index. On each index there is a separate linked list of
entries.
Create a hash table. Add items. Retrieve items. Remove items. Destroy table.
## `Curl_hash_init`
~~~c
void Curl_hash_init(struct Curl_hash *h,
size_t slots,
hash_function hfunc,
comp_function comparator,
Curl_hash_dtor dtor);
~~~
The call initializes a `struct Curl_hash`.
- `slots` is the number of entries to create in the hash table. Larger is
better (faster lookups) but also uses more memory.
- `hfunc` is a function pointer to a function that returns a `size_t` value as
a checksum for an entry in this hash table. Ideally, it returns a unique
value for every entry ever added to the hash table, but hash collisions are
handled.
- `comparator` is a function pointer to a function that compares two hash
table entries. It should return non-zero if the compared items are
identical.
- `dtor` is a function pointer to a destructor called when an entry is removed
from the table
## `Curl_hash_add`
~~~c
void *
Curl_hash_add(struct Curl_hash *h, void *key, size_t key_len, void *p)
~~~
This call adds an entry to the hash. `key` points to the hash key and
`key_len` is the length of the hash key. `p` is a custom pointer.
If there already was a match in the hash, that data is replaced with this new
entry.
This function also lazily allocates the table if needed, as it is not done in
the `Curl_hash_init` function.
Returns NULL on error, otherwise it returns a pointer to `p`.
## `Curl_hash_add2`
~~~c
void *Curl_hash_add2(struct Curl_hash *h, void *key, size_t key_len, void *p,
Curl_hash_elem_dtor dtor)
~~~
This works like `Curl_hash_add` but has an extra argument: `dtor`, which is a
destructor call for this specific entry. When this entry is removed, this
function is called instead of the function stored for the whole hash table.
## `Curl_hash_delete`
~~~c
int Curl_hash_delete(struct Curl_hash *h, void *key, size_t key_len);
~~~
This function removes an entry from the hash table. If successful, it returns
zero. If the entry was not found, it returns 1.
## `Curl_hash_pick`
~~~c
void *Curl_hash_pick(struct Curl_hash *h, void *key, size_t key_len);
~~~
If there is an entry in the hash that matches the given `key` with size of
`key_len`, that its custom pointer is returned. The pointer that was called
`p` when the entry was added.
It returns NULL if there is no matching entry in the hash.
## `Curl_hash_destroy`
~~~c
void Curl_hash_destroy(struct Curl_hash *h);
~~~
This function destroys a hash and cleanups up all its related data. Calling it
multiple times is fine.
## `Curl_hash_clean`
~~~c
void Curl_hash_clean(struct Curl_hash *h);
~~~
This function removes all the entries in the given hash.
## `Curl_hash_clean_with_criterium`
~~~c
void
Curl_hash_clean_with_criterium(struct Curl_hash *h, void *user,
int (*comp)(void *, void *))
~~~
This function removes all the entries in the given hash that matches the
criterion. The provided `comp` function determines if the criteria is met by
returning non-zero.
## `Curl_hash_count`
~~~c
size_t Curl_hash_count(struct Curl_hash *h)
~~~
Returns the number of entries stored in the hash.
## `Curl_hash_start_iterate`
~~~c
void Curl_hash_start_iterate(struct Curl_hash *hash,
struct Curl_hash_iterator *iter):
~~~
This function initializes a `struct Curl_hash_iterator` that `iter` points to.
It can then be used to iterate over all the entries in the hash.
## `Curl_hash_next_element`
~~~c
struct Curl_hash_element *
Curl_hash_next_element(struct Curl_hash_iterator *iter);
~~~
Given the iterator `iter`, this function returns a pointer to the next hash
entry if there is one, or NULL if there is no more entries.
Called repeatedly, it iterates over all the entries in the hash table.
Note: it only guarantees functionality if the hash table remains untouched
during its iteration.

View File

@@ -0,0 +1,195 @@
<!--
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
SPDX-License-Identifier: curl
-->
# `llist` - linked lists
#include "llist.h"
This is the internal module for linked lists. The API is designed to be
flexible but also to avoid dynamic memory allocation.
None of the involved structs should be accessed using struct fields (outside
of `llist.c`). Use the functions.
## Setup and shutdown
`struct Curl_llist` is the struct holding a single linked list. It needs to be
initialized with a call to `Curl_llist_init()` before it can be used
To clean up a list, call `Curl_llist_destroy()`. Since the linked lists
themselves do not allocate memory, it can also be fine to just *not* clean up
the list.
## Add a node
There are two functions for adding a node to a linked list:
1. Add it last in the list with `Curl_llist_append`
2. Add it after a specific existing node with `Curl_llist_insert_next`
When a node is added to a list, it stores an associated custom pointer to
anything you like and you provide a pointer to a `struct Curl_llist_node`
struct in which it stores and updates pointers. If you intend to add the same
struct to multiple lists concurrently, you need to have one `struct
Curl_llist_node` for each list.
Add a node to a list with `Curl_llist_append(list, elem, node)`. Where
- `list`: points to a `struct Curl_llist`
- `elem`: points to what you want added to the list
- `node`: is a pointer to a `struct Curl_llist_node`. Data storage for this
node.
Example: to add a `struct foobar` to a linked list. Add a node struct within
it:
struct foobar {
char *random;
struct Curl_llist_node storage; /* can be anywhere in the struct */
char *data;
};
struct Curl_llist barlist; /* the list for foobar entries */
struct foobar entries[10];
Curl_llist_init(&barlist, NULL);
/* add the first struct to the list */
Curl_llist_append(&barlist, &entries[0], &entries[0].storage);
See also `Curl_llist_insert_next`.
## Remove a node
Remove a node again from a list by calling `Curl_llist_remove()`. This
destroys the node's `elem` (e.g. calling a registered free function).
To remove a node without destroying its `elem`, use `Curl_node_take_elem()`
which returns the `elem` pointer and removes the node from the list. The
caller then owns this pointer and has to take care of it.
## Iterate
To iterate over a list: first get the head entry and then iterate over the
nodes as long there is a next. Each node has an *element* associated with it,
the custom pointer you stored there. Usually a struct pointer or similar.
struct Curl_llist_node *iter;
/* get the first entry of the 'barlist' */
iter = Curl_llist_head(&barlist);
while(iter) {
/* extract the element pointer from the node */
struct foobar *elem = Curl_node_elem(iter);
/* advance to the next node in the list */
iter = Curl_node_next(iter);
}
# Function overview
## `Curl_llist_init`
~~~c
void Curl_llist_init(struct Curl_llist *list, Curl_llist_dtor dtor);
~~~
Initializes the `list`. The argument `dtor` is NULL or a function pointer that
gets called when list nodes are removed from this list.
The function is infallible.
~~~c
typedef void (*Curl_llist_dtor)(void *user, void *elem);
~~~
`dtor` is called with two arguments: `user` and `elem`. The first being the
`user` pointer passed in to `Curl_llist_remove()`or `Curl_llist_destroy()` and
the second is the `elem` pointer associated with removed node. The pointer
that `Curl_node_elem()` would have returned for that node.
## `Curl_llist_destroy`
~~~c
void Curl_llist_destroy(struct Curl_llist *list, void *user);
~~~
This removes all nodes from the `list`. This leaves the list in a cleared
state.
The function is infallible.
## `Curl_llist_append`
~~~c
void Curl_llist_append(struct Curl_llist *list,
const void *elem, struct Curl_llist_node *node);
~~~
Adds `node` last in the `list` with a custom pointer to `elem`.
The function is infallible.
## `Curl_llist_insert_next`
~~~c
void Curl_llist_insert_next(struct Curl_llist *list,
struct Curl_llist_node *node,
const void *elem,
struct Curl_llist_node *node);
~~~
Adds `node` to the `list` with a custom pointer to `elem` immediately after
the previous list `node`.
The function is infallible.
## `Curl_llist_head`
~~~c
struct Curl_llist_node *Curl_llist_head(struct Curl_llist *list);
~~~
Returns a pointer to the first node of the `list`, or a NULL if empty.
## `Curl_node_uremove`
~~~c
void Curl_node_uremove(struct Curl_llist_node *node, void *user);
~~~
Removes the `node` the list it was previously added to. Passes the `user`
pointer to the list's destructor function if one was setup.
The function is infallible.
## `Curl_node_remove`
~~~c
void Curl_node_remove(struct Curl_llist_node *node);
~~~
Removes the `node` the list it was previously added to. Passes a NULL pointer
to the list's destructor function if one was setup.
The function is infallible.
## `Curl_node_elem`
~~~c
void *Curl_node_elem(struct Curl_llist_node *node);
~~~
Given a list node, this function returns the associated element.
## `Curl_node_next`
~~~c
struct Curl_llist_node *Curl_node_next(struct Curl_llist_node *node);
~~~
Given a list node, this function returns the next node in the list.

View File

@@ -0,0 +1,72 @@
<!--
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
SPDX-License-Identifier: curl
-->
# Multi Identifiers (mid)
All transfers (easy handles) added to a multi handle are assigned
a unique identifier until they are removed again. The multi handle
keeps a table `multi->xfers` that allow O(1) access to the easy
handle by its `mid`.
References to other easy handles *should* keep their `mid`s instead
of a pointer (not all code has been converted as of now). This solves
problems in easy and multi handle life cycle management as well as
iterating over handles where operations may add/remove other handles.
### Values and Lifetime
An `mid` is an `unsigned int`. There are two reserved values:
* `0`: is the `mid` of an internal "admin" handle. Multi and share handles
each have their own admin handle for maintenance operations, like
shutting down connections.
* `UINT_MAX`: the "invalid" `mid`. Easy handles are initialized with
this value. They get it assigned again when removed from
a multi handle.
This makes potential range of `mid`s from `1` to `UINT_MAX - 1` *inside
the same multi handle at the same time*. However, the `multi->xfers` table
reuses `mid` values from previous transfers that have been removed.
`multi->xfers` is created with an initial capacity. At the time of this
writing that is `16` for "multi_easy" handles (used in `curl_easy_perform()`
and `512` for multi handles created with `curl_multi_init()`.
The first added easy handle gets `mid == 1` assigned. The second one receives `2`,
even when the fist one has been removed already. Every added handle gets an
`mid` one larger than the previously assigned one. Until the capacity of
the table is reached and it starts looking for a free id at `1` again (`0`
is always in the table).
When adding a new handle, the multi checks the amount of free entries
in the `multi->xfers` table. If that drops below a threshold (currently 25%),
the table is resized. This serves two purposes: one, a previous `mid` is not
reused immediately and second, table resizes are not needed that often.
The table is implemented in `uint-table.[ch]`. More details in [`UINT_SETS`](UINT_SETS.md).
### Tracking `mid`s
There are several places where transfers need to be tracked:
* the multi tracks `process`, `pending` and `msgsent` transfers. A transfer
is in at most one of these at a time.
* connections track the transfers that are *attached* to them.
* multi event handling tracks transfers interested in a specific socket.
* DoH handles track the handle they perform lookups for (and vice versa).
There are two bitset implemented for storing `mid`s: `uint_bset` and `uint_spbset`.
The first is a bitset optimal for storing a large number of unsigned int values.
The second one is a "sparse" variant good for storing a small set of numbers.
More details about these in [`UINT_SETS`](UINT_SETS.md).
A multi uses `uint_bset`s for `process`, `pending` and `msgsent`. Connections
and sockets use the sparse variant as both often track only a single transfer
and at most 100 on an HTTP/2 or HTTP/3 connection/socket.
These sets allow safe iteration while being modified. This allows a multi
to iterate over its "process" set while existing transfers are removed
or new ones added.

View File

@@ -0,0 +1,57 @@
<!--
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
SPDX-License-Identifier: curl
-->
# MQTT in curl
## Usage
A plain "GET" subscribes to the topic and prints all published messages.
Doing a "POST" publishes the post data to the topic and exits.
### Subscribing
Command usage:
curl mqtt://host/topic
Example subscribe:
curl mqtt://host.home/bedroom/temp
This sends an MQTT SUBSCRIBE packet for the topic `bedroom/temp` and listen in
for incoming PUBLISH packets.
You can set the upkeep interval ms option to make curl send MQTT ping requests to the
server at an internal, to prevent the connection to get closed because of idleness.
You might then need to use the progress callback to cancel the operation.
### Publishing
Command usage:
curl -d payload mqtt://host/topic
Example publish:
curl -d 75 mqtt://host.home/bedroom/dimmer
This sends an MQTT PUBLISH packet to the topic `bedroom/dimmer` with the
payload `75`.
## What does curl deliver as a response to a subscribe
Whenever a PUBLISH packet is received, curl outputs two bytes topic length (MSB | LSB), the topic followed by the
payload.
## Caveats
Remaining limitations:
- Only QoS level 0 is implemented for publish
- No way to set retain flag for publish
- No TLS (mqtts) support
- Naive EAGAIN handling does not handle split messages

View File

@@ -0,0 +1,127 @@
<!--
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
SPDX-License-Identifier: curl
-->
# Multi Event Based
A libcurl multi is operating "event based" when the application uses
and event library like `libuv` to monitor the sockets and file descriptors
libcurl uses to trigger transfer operations. How that works from the
applications point of view is described in libcurl-multi(3).
This documents is about the internal handling.
## Source Locations
All code related to event based handling is found in `lib/multi_ev.c`
and `lib/multi_ev.h`. The header defines a set of internal functions
and `struct curl_multi_ev` that is embedded in each multi handle.
There is `Curl_multi_ev_init()` and `Curl_multi_ev_cleanup()` to manage
the overall life cycle, call on creation and destruction of the multi
handle.
## Tracking Events
First, the various functions in `lib/multi_ev.h` only ever really do
something when the libcurl application has registered its callback
in `multi->socket_cb`.
This is important as this callback gets informed about *changes* to sockets.
When a new socket is added, an existing is removed, or the `POLLIN/OUT`
flags change, `multi->socket_cb` needs to be invoked. `multi_ev` has to
track what it already reported to detect changes.
Most applications are expected to go "event based" right from the start,
but the libcurl API does not prohibit an application to start another
way and then go for events later on, even in the middle of a transfer.
### Transfer Events
Most event that happen are in connection with a transfer. A transfer
opens a connection, which opens a socket, and waits for this socket
to become writable (`POLLOUT`) when using TCP, for example.
The multi then calls `Curl_multi_ev_assess_xfer(multi, data)` to
let the multi event code detect what sockets the transfer is interested in.
If indeed a `multi->socket_cb` is set, the *current* transfer pollset is
retrieved via `Curl_multi_getsock()`. This current pollset is then
compared to the *previous* pollset. If relevant changes are detected,
`multi->socket_cb` gets informed about those. These can be:
* a socket is in the current set, but not the previous one
* a socket was also in the previous one, but IN/OUT flags changed
* a socket in the previous one is no longer part of the current
`multi_ev.c` keeps a `struct mev_sh_entry` for each sockets in a hash
with the socket as key. It tracks in each entry which transfers are
interested in this particular socket. How many transfer want to read
and/or write and what the summarized `POLLIN/POLLOUT` action, that
had been reported to `multi->socket_cb` was.
This is necessary as a socket may be in use by several transfers
at the same time (think HTTP/2 on the same connection). When a transfer
is done and gets removed from the socket entry, it decrements
the reader and/or writer count (depending on what it was last
interested in). This *may* result in the entry's summarized action
to change, or not.
### Connection Events
There are also events not connected to any transfer that need to be tracked.
The multi connection cache, concerned with clean shutdowns of connections,
is interested in socket events during the shutdown.
To allow use of the libcurl infrastructure, the connection cache operates
using an *internal* easy handle that is not a transfer as such. The
internal handle is used for all connection shutdown operations, being tied
to a particular connection only for a short time. This means tracking
the last pollset for an internal handle is useless.
Instead, the connection cache uses `Curl_multi_ev_assess_conn()` to have
multi event handling check the connection and track a "last pollset"
for the connection alone.
## Event Processing
When the libcurl application is informed by the event library that
a particular socket has an event, it calls `curl_multi_socket_action()`
to make libcurl react to it. This internally invokes
`Curl_multi_ev_expire_xfers()` which expires all transfers that
are interested in the given socket, so the multi handle runs them.
In addition `Curl_multi_ev_expire_xfers()` returns a `bool` to let
the multi know that connections are also interested in the socket, so
the connection pool should be informed as well.
## All Things Pass
When a transfer is done, e.g. removed from its multi handle, the
multi calls `Curl_multi_ev_xfer_done()`. This cleans up the pollset
tracking for the transfer.
When a connection is done, and before it is destroyed,
`Curl_multi_ev_conn_done()` is called. This cleans up the pollset
tracking for this connection.
When a socket is about to be closed, `Curl_multi_ev_socket_done()`
is called to cleanup the socket entry and all information kept there.
These calls do not have to happen in any particular order. A transfer's
socket may be around while the transfer is ongoing. Or it might disappear
in the middle of things. Also, a transfer might be interested in several
sockets at the same time (resolving, eye balling, ftp are all examples of
those).
### And Come Again
While transfer and connection identifier are practically unique in a
libcurl application, sockets are not. Operating systems are keen on reusing
their resources, and the next socket may get the same identifier as
one just having been closed with high likelihood.
This means that multi event handling needs to be informed *before* a close,
clean up all its tracking and be ready to see that same socket identifier
again right after.

View File

@@ -0,0 +1,116 @@
<!--
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
SPDX-License-Identifier: curl
-->
# Adding a new protocol?
Every once in a while, someone comes up with the idea of adding support for yet
another protocol to curl. After all, curl already supports 25 something
protocols and it is the Internet transfer machine for the world.
In the curl project we love protocols and we love supporting many protocols
and doing it well.
How do you proceed to add a new protocol and what are the requirements?
## No fixed set of requirements
This document is an attempt to describe things to consider. There is no
checklist of the twenty-seven things you need to cross off. We view the entire
effort as a whole and then judge if it seems to be the right thing - for now.
The more things that look right, fit our patterns and are done in ways that
align with our thinking, the better are the chances that we agree that
supporting this protocol is a grand idea.
## Mutual benefit is preferred
curl is not here for your protocol. Your protocol is not here for curl. The
best cooperation and end result occur when all involved parties mutually see
and agree that supporting this protocol in curl would be good for everyone.
Heck, for the world.
Consider "selling us" the idea that we need an implementation merged in curl,
to be fairly important. *Why* do we want curl to support this new protocol?
## Protocol requirements
### Client-side
The protocol implementation is for a client's side of a "communication
session".
### Transfer oriented
The protocol itself should be focused on *transfers*. Be it uploads or
downloads or both. It should at least be possible to view the transfers as
such, like we can view reading emails over POP3 as a download and sending
emails over SMTP as an upload.
If you cannot even shoehorn the protocol into a transfer focused view, then
you are up for a tough argument.
### URL
There should be a documented URL format. If there is an RFC for it there is no
question about it but the syntax does not have to be a published RFC. It could
be enough if it is already in use by other implementations.
If you make up the syntax just in order to be able to propose it to curl, then
you are in a bad place. URLs are designed and defined for interoperability.
There should at least be a good chance that other clients and servers can be
implemented supporting the same URL syntax and work the same or similar way.
URLs work on registered 'schemes'. There is a register of [all officially
recognized
schemes](https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml). If
your protocol is not in there, is it really a protocol we want?
### Wide and public use
The protocol shall already be used or have an expectation of getting used
widely. Experimental protocols are better off worked on in experiments first,
to prove themselves before they are adopted by curl.
## Code
Of course the code needs to be written, provided, licensed agreeably and it
should follow our code guidelines and review comments have to be dealt with.
If the implementation needs third party code, that third party code should not
have noticeably lesser standards than the curl project itself.
## Tests
As much of the protocol implementation as possible needs to be verified by
curl test cases. We must have the implementation get tested by CI jobs,
torture tests and more.
We have experienced many times in the past how new implementations were brought
to curl and immediately once the code had been merged, the originator vanished
from the face of the earth. That is fine, but we need to take the necessary
precautions so when it happens we are still fine.
Our test infrastructure is powerful enough to test just about every possible
protocol - but it might require a bit of an effort to make it happen.
## Documentation
We cannot assume that users are particularly familiar with details and
peculiarities of the protocol. It needs documentation.
Maybe it even needs some internal documentation so that the developers who try
to debug something five years from now can figure out functionality a little
easier.
The protocol specification itself should be freely available without requiring
a non-disclosure agreement or similar.
## Do not compare
We are constantly raising the bar and we are constantly improving the project.
A lot of things we did in the past would not be acceptable if done today.
Therefore, you might be tempted to use shortcuts or "hacks" you can spot
other - existing - protocol implementations have used, but there is nothing to
gain from that. The bar has been raised. Former "cheats" may not tolerated
anymore.

View File

@@ -0,0 +1,47 @@
<!--
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
SPDX-License-Identifier: curl
-->
# porting libcurl
The basic approach I use when porting libcurl to another OS when the existing
configure or cmake build setups are not suitable.
## Build
Write a build script/Makefile that builds *all* C files under lib/. If
possible, use the `lib/Makefile.inc` that lists all files in Makefile
variables.
In the Makefile, make sure you define what OS you build for: `-D[OPERATING
SYSTEM]`, or similar. Perhaps the compiler in use already define a standard
one? Then you might not need to define your own.
## Add the new OS
In the `lib/curl_config.h` header file, in the section for when `HAVE_CONFIG_H`
is *not* defined (starting at around line 150), add a new conditional include
in this style:
~~~c
#ifdef [OPERATING SYSTEM]
# include "config-operatingsystem.h"
#endif
~~~
Create `lib/config-operatingsystem.h`. You might want to start with copying a
another config-* file and then start trimming according to what your
environment supports.
## Build it
When you run into compiler warnings or errors, the
`lib/config-operatingsystem.h` file should be where you should focus your work
and edits.
A recommended approach is to define a lot of the `CURL_DISABLE_*` defines (see
the [CURL-DISABLE](../CURL-DISABLE.md) document) initially to help narrow down
the initial work as that can save you from having to give attention to areas of
the code that you do not care for in your port.

View File

@@ -0,0 +1,12 @@
<!--
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
SPDX-License-Identifier: curl
-->
# Internals
This directory contains documentation covering libcurl internals; APIs and
concepts that are useful for contributors and maintainers.
Public APIs are documented in the public documentation, not here.

View File

@@ -0,0 +1,71 @@
<!--
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
SPDX-License-Identifier: curl
-->
# scorecard.py
This is an internal script in `tests/http/scorecard.py` used for testing
curl's performance in a set of cases. These are for exercising parts of
curl/libcurl in a reproducible fashion to judge improvements or detect
regressions. They are not intended to represent real world scenarios
as such.
This script is not part of any official interface and we may
change it in the future according to the project's needs.
## setup
When you are able to run curl's `pytest` suite, scorecard should work
for you as well. They start a local Apache httpd or Caddy server and
invoke the locally build `src/curl` (by default).
## invocation
A typical invocation for measuring performance of HTTP/2 downloads would be:
```
curl> python3 tests/http/scorecard.py -d h2
```
and this prints a table with the results. The last argument is the protocol to test and
it can be `h1`, `h2` or `h3`. You can add `--json` to get results in JSON instead of text.
Help for all command line options are available via:
```
curl> python3 tests/http/scorecard.py -h
```
## scenarios
Apart from `-d/--downloads` there is `-u/--uploads` and `-r/--requests`. These are run with
a variation of resource sizes and parallelism by default. You can specify these in some way
if you are just interested in a particular case.
For example, to run downloads of a 1 MB resource only, 100 times with at max 6 parallel transfers, use:
```
curl> python3 tests/http/scorecard.py -d --download-sizes=1mb --download-count=100 --download-parallel=6 h2
```
Similar options are available for uploads and requests scenarios.
## dtrace
With the `--dtrace` option, scorecard produces a dtrace sample of the user stacks in `tests/http/gen/curl/curl.user_stacks`. On many platforms, `dtrace` requires **special permissions**. It is therefore invoked via `sudo` and you should make sure that sudo works for the run without prompting for a password.
Note: the file is the trace of the last curl invocation by scorecard. Use the parameters to narrow down the runs to the particular case you are interested in.
## flame graphs
With the excellent [Flame Graph](https://github.com/brendangregg/FlameGraph) by Brendan Gregg, scorecard can turn the `dtrace` samples into an interactive SVG. Set the environment variable `FLAMEGRAPH` to the location of your clone of that project and invoked scorecard with the `--flame` option. Like
```
curl> FLAMEGRAPH=/Users/sei/projects/FlameGraph python3 tests/http/scorecard.py \
-r --request-count=50000 --request-parallels=100 --samples=1 --flame h2
```
and the SVG of the run is in `tests/http/gen/curl/curl.flamegraph.svg`. You can open that in Firefox and zoom in/out of stacks of interest.
Note: as with `dtrace`, the flame graph is for the last invocation of curl done by scorecard.

View File

@@ -0,0 +1,111 @@
<!--
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
SPDX-License-Identifier: curl
-->
# `splay`
#include "splay.h"
This is an internal module for splay tree management. A splay tree is a binary
search tree with the additional property that recently accessed elements are
quick to access again. A self-balancing tree.
Nodes are added to the tree, they are accessed and removed from the tree and
it automatically rebalances itself in each operation.
## libcurl use
libcurl adds fixed timeout expiry timestamps to the splay tree, and is meant
to scale up to holding a huge amount of pending timeouts with decent
performance.
The splay tree is used to:
1. figure out the next timeout expiry value closest in time
2. iterate over timeouts that already have expired
This splay tree rebalances itself based on the time value.
Each node in the splay tree points to a `struct Curl_easy`. Each `Curl_easy`
struct is represented only once in the tree. To still allow each easy handle
to have a large number of timeouts per handle, each handle has a sorted linked
list of pending timeouts. Only the handle's timeout that is closest to expire
is the timestamp used for the splay tree node.
When a specific easy handle's timeout expires, the node gets removed from the
splay tree and from the handle's linked list of timeouts. The next timeout for
that handle is then first in line and becomes the new timeout value as the
node is re-added to the splay.
## `Curl_splay`
~~~c
struct Curl_tree *Curl_splay(struct curltime i, struct Curl_tree *t);
~~~
Rearranges the tree `t` after the provide time `i`.
## `Curl_splayinsert`
~~~c
struct Curl_tree *Curl_splayinsert(struct curltime key,
struct Curl_tree *t,
struct Curl_tree *node);
~~~
This function inserts a new `node` in the tree, using the given `key`
timestamp. The `node` struct has a field called `->payload` that can be set to
point to anything. libcurl sets this to the `struct Curl_easy` handle that is
associated with the timeout value set in `key`.
The splay insert function does not allocate any memory, it assumes the caller
has that arranged.
It returns a pointer to the new tree root.
## `Curl_splaygetbest`
~~~c
struct Curl_tree *Curl_splaygetbest(struct curltime key,
struct Curl_tree *tree,
struct Curl_tree **removed);
~~~
If there is a node in the `tree` that has a time value that is less than the
provided `key`, this function removes that node from the tree and provides it
in the `*removed` pointer (or NULL if there was no match).
It returns a pointer to the new tree root.
## `Curl_splayremove`
~~~c
int Curl_splayremove(struct Curl_tree *tree,
struct Curl_tree *node,
struct Curl_tree **newroot);
~~~
Removes a given `node` from a splay `tree`, and returns the `newroot`
identifying the new tree root.
Note that a clean tree without any nodes present implies a NULL pointer.
## `Curl_splayset`
~~~c
void Curl_splayset(struct Curl_tree *node, void *payload);
~~~
Set a custom pointer to be stored in the splay node. This pointer is not used
by the splay code itself and can be retrieved again with `Curl_splayget`.
## `Curl_splayget`
~~~c
void *Curl_splayget(struct Curl_tree *node);
~~~
Get the custom pointer from the splay node that was previously set with
`Curl_splayset`. If no pointer was set before, it returns NULL.

View File

@@ -0,0 +1,230 @@
<!--
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
SPDX-License-Identifier: curl
-->
# String parsing with `strparse`
The functions take input via a pointer to a pointer, which allows the
functions to advance the pointer on success which then by extension allows
"chaining" of functions like this example that gets a word, a space and then a
second word:
~~~c
if(curlx_str_word(&line, &word1, MAX) ||
curlx_str_singlespace(&line) ||
curlx_str_word(&line, &word2, MAX))
fprintf(stderr, "ERROR\n");
~~~
The input pointer **must** point to a null-terminated buffer area or these
functions risk continuing "off the edge".
## Strings
The functions that return string information does so by populating a
`struct Curl_str`:
~~~c
struct Curl_str {
char *str;
size_t len;
};
~~~
Access the struct fields with `curlx_str()` for the pointer and `curlx_strlen()`
for the length rather than using the struct fields directly.
## `curlx_str_init`
~~~c
void curlx_str_init(struct Curl_str *out)
~~~
This initiates a string struct. The parser functions that store info in
strings always init the string themselves, so this stand-alone use is often
not necessary.
## `curlx_str_assign`
~~~c
void curlx_str_assign(struct Curl_str *out, const char *str, size_t len)
~~~
Set a pointer and associated length in the string struct.
## `curlx_str_word`
~~~c
int curlx_str_word(char **linep, struct Curl_str *out, const size_t max);
~~~
Get a sequence of bytes until the first space or the end of the string. Return
non-zero on error. There is no way to include a space in the word, no sort of
escaping. The word must be at least one byte, otherwise it is considered an
error.
`max` is the longest accepted word, or it returns error.
On a successful return, `linep` is updated to point to the byte immediately
following the parsed word.
## `curlx_str_until`
~~~c
int curlx_str_until(char **linep, struct Curl_str *out, const size_t max,
char delim);
~~~
Like `curlx_str_word` but instead of parsing to space, it parses to a given
custom delimiter non-zero byte `delim`.
`max` is the longest accepted word, or it returns error.
The parsed word must be at least one byte, otherwise it is considered an
error.
## `curlx_str_untilnl`
~~~c
int curlx_str_untilnl(char **linep, struct Curl_str *out, const size_t max);
~~~
Like `curlx_str_untilnl` but instead parses until it finds a "newline byte".
That means either a CR (ASCII 13) or an LF (ASCII 10) octet.
`max` is the longest accepted word, or it returns error.
The parsed word must be at least one byte, otherwise it is considered an
error.
## `curlx_str_cspn`
~~~c
int curlx_str_cspn(const char **linep, struct Curl_str *out, const char *cspn);
~~~
Get a sequence of characters until one of the bytes in the `cspn` string
matches. Similar to the `strcspn` function.
## `curlx_str_quotedword`
~~~c
int curlx_str_quotedword(char **linep, struct Curl_str *out, const size_t max);
~~~
Get a "quoted" word. This means everything that is provided within a leading
and an ending double quote character. No escaping possible.
`max` is the longest accepted word, or it returns error.
The parsed word must be at least one byte, otherwise it is considered an
error.
## `curlx_str_single`
~~~c
int curlx_str_single(char **linep, char byte);
~~~
Advance over a single character provided in `byte`. Return non-zero on error.
## `curlx_str_singlespace`
~~~c
int curlx_str_singlespace(char **linep);
~~~
Advance over a single ASCII space. Return non-zero on error.
## `curlx_str_passblanks`
~~~c
void curlx_str_passblanks(char **linep);
~~~
Advance over all spaces and tabs.
## `curlx_str_trimblanks`
~~~c
void curlx_str_trimblanks(struct Curl_str *out);
~~~
Trim off blanks (spaces and tabs) from the start and the end of the given
string.
## `curlx_str_number`
~~~c
int curlx_str_number(char **linep, curl_size_t *nump, size_t max);
~~~
Get an unsigned decimal number not larger than `max`. Leading zeroes are just
swallowed. Return non-zero on error. Returns error if there was not a single
digit.
## `curlx_str_numblanks`
~~~c
int curlx_str_numblanks(char **linep, curl_size_t *nump);
~~~
Get an unsigned 63-bit decimal number. Leading blanks and zeroes are skipped.
Returns non-zero on error. Returns error if there was not a single digit.
## `curlx_str_hex`
~~~c
int curlx_str_hex(char **linep, curl_size_t *nump, size_t max);
~~~
Get an unsigned hexadecimal number not larger than `max`. Leading zeroes are
just swallowed. Return non-zero on error. Returns error if there was not a
single digit. Does *not* handled `0x` prefix.
## `curlx_str_octal`
~~~c
int curlx_str_octal(char **linep, curl_size_t *nump, size_t max);
~~~
Get an unsigned octal number not larger than `max`. Leading zeroes are just
swallowed. Return non-zero on error. Returns error if there was not a single
digit.
## `curlx_str_newline`
~~~c
int curlx_str_newline(char **linep);
~~~
Check for a single CR or LF. Return non-zero on error */
## `curlx_str_casecompare`
~~~c
int curlx_str_casecompare(struct Curl_str *str, const char *check);
~~~
Returns true if the provided string in the `str` argument matches the `check`
string case insensitively.
## `curlx_str_cmp`
~~~c
int curlx_str_cmp(struct Curl_str *str, const char *check);
~~~
Returns true if the provided string in the `str` argument matches the `check`
string case sensitively. This is *not* the same return code as `strcmp`.
## `curlx_str_nudge`
~~~c
int curlx_str_nudge(struct Curl_str *str, size_t num);
~~~
Removes `num` bytes from the beginning (left) of the string kept in `str`. If
`num` is larger than the string, it instead returns an error.

View File

@@ -0,0 +1,163 @@
<!--
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
SPDX-License-Identifier: curl
-->
# TLS Sessions and Tickets
The TLS protocol offers methods of "resuming" a previous "session". A
TLS "session" is a negotiated security context across a connection
(which may be via TCP or UDP or other transports.)
By "resuming", the TLS protocol means that the security context from
before can be fully or partially resurrected when the TLS client presents
the proper crypto stuff to the server. This saves on the amount of
TLS packets that need to be sent back and forth, reducing amount
of data and even latency. In the case of QUIC, resumption may send
application data without having seen any reply from the server, hence
this is named 0-RTT data.
The exact mechanism of session tickets in TLSv1.2 (and earlier) and
TLSv1.3 differs. TLSv1.2 tickets have several weaknesses (that can
be exploited by attackers) which TLSv1.3 then fixed. See
[Session Tickets in the real world](https://words.filippo.io/we-need-to-talk-about-session-tickets/)
for an insight into this topic.
These difference between TLS protocol versions are reflected in curl's
handling of session tickets. More below.
## curl's `ssl_peer_key`
In order to find a ticket from a previous TLS session, curl
needs a name for TLS sessions that uniquely identifies the peer
it talks to.
This name has to reflect also the various TLS parameters that can
be configured in curl for a connection. We do not want to use
a ticket from an different configuration. Example: when setting
the maximum TLS version to 1.2, we do not want to reuse a ticket
we got from a TLSv1.3 session, although we are talking to the
same host.
Internally, we call this name a `ssl_peer_key`. It is a printable
string that carries hostname and port and any non-default TLS
parameters involved in the connection.
Examples:
- `curl.se:443:CA-/etc/ssl/cert.pem:IMPL-GnuTLS/3.8.7` is a peer key for
a connection to `curl.se:443` using `/etc/ssl/cert.pem` as CA
trust anchors and GnuTLS/3.8.7 as TLS backend.
- `curl.se:443:TLSVER-6-6:CA-/etc/ssl/cert.pem:IMPL-GnuTLS/3.8.7` is the
same as the previous, except it is configured to use TLSv1.2 as
min and max versions.
Different configurations produce different keys which is just what
curl needs when handling SSL session tickets.
One important thing: peer keys do not contain confidential information. If you
configure a client certificate or SRP authentication with username/password,
these are not part of the peer key.
However, peer keys carry the hostnames you use curl for. They *do*
leak the privacy of your communication. We recommend to *not* persist
peer keys for this reason.
**Caveat**: The key may contain filenames or paths. It does not reflect the
*contents* in the filesystem. If you change `/etc/ssl/cert.pem` and reuse a
previous ticket, curl might trust a server which no longer has a root
certificate in the file.
## Session Cache Access
#### Lookups
When a new connection is being established, each SSL connection filter creates
its own peer_key and calls into the cache. The cache then looks for a ticket
with exactly this peer_key. Peer keys between proxy SSL filters and SSL
filters talking through a tunnel differ, as they talk to different peers.
If the connection filter wants to use a client certificate or SRP
authentication, the cache checks those as well. If the cache peer carries
client cert or SRP auth, the connection filter must have those with the same
values (and vice versa).
On a match, the connection filter gets the session ticket and feeds that to
the TLS implementation which, on accepting it, tries to resume it for a
shorter handshake. In addition, the filter gets the ALPN used before and the
amount of 0-RTT data that the server announced to be willing to accept. The
filter can then decide if it wants to attempt 0-RTT or not. (The ALPN is
needed to know if the server speaks the protocol you want to send in 0-RTT. It
makes no sense to send HTTP/2 requests to a server that only knows HTTP/1.1.)
#### Updates
When a new TLS session ticket is received by a filter, it adds it to the
cache using its peer_key and SSL configuration. The cache looks for
a matching entry and, should it find one, adds the ticket for this
peer.
### Put, Take and Return
when a filter accesses the session cache, it *takes*
a ticket from the cache, meaning a returned ticket is removed. The filter
then configures its TLS backend and *returns* the ticket to the cache.
The cache needs to treat tickets from TLSv1.2 and 1.3 differently. 1.2 tickets
should be reused, but 1.3 tickets SHOULD NOT (RFC 8446). The session cache
simply drops 1.3 tickets when they are returned after use, but keeps a 1.2
ticket.
When a ticket is *put* into the cache, there is also a difference. There
can be several 1.3 tickets at the same time, but only a single 1.2 ticket.
TLSv1.2 tickets replace any other. 1.3 tickets accumulate up to a max
amount.
By having a "put/take/return" we reflect the 1.3 use case nicely. Two
concurrent connections do not reuse the same ticket.
## Session Ticket Persistence
#### Privacy and Security
As mentioned above, ssl peer keys are not intended for storage in a file
system. They clearly show which hosts the user talked to. This maybe "just"
privacy relevant, but has security implications as an attacker might find
worthy targets among your peer keys.
Also, we do not recommend to persist TLSv1.2 tickets.
### Salted Hashes
The TLS session cache offers an alternative to storing peer keys:
it provides a salted SHA256 hash of the peer key for import and export.
#### Export
The salt is generated randomly for each peer key on export. The SHA256 makes
sure that the peer key cannot be reversed and that a slightly different key
still produces a different result.
This means an attacker cannot just "grep" a session file for a particular
entry, e.g. if they want to know if you accessed a specific host. They *can*
however compute the SHA256 hashes for all salts in the file and find a
specific entry. They *cannot* find a hostname they do not know. They would
have to brute force by guessing.
#### Import
When session tickets are imported from a file, curl only gets the salted
hashes. The imported tickets belong to an *unknown* peer key.
When a connection filter tries to *take* a session ticket, it passes its peer
key. This peer key initially does not match any tickets in the cache. The
cache then checks all entries with unknown peer keys if the passed key matches
their salted hash. If it does, the peer key is recovered and remembered at the
cache entry.
This is a performance penalty in the order of "unknown" peer keys which
diminishes over time when keys are rediscovered. Note that this also works for
putting a new ticket into the cache: when no present entry matches, a new one
with peer key is created. This peer key then no longer bears the cost of hash
computes.

View File

@@ -0,0 +1,128 @@
<!--
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
SPDX-License-Identifier: curl
-->
# Unsigned Int Sets
The multi handle tracks added easy handles via an unsigned int
it calls an `mid`. There are four data structures for unsigned int
optimized for the multi use case.
## `uint_tbl`
`uint_table`, implemented in `uint-table.[ch]` manages an array
of `void *`. The unsigned int are the index into this array. It is
created with a *capacity* which can be *resized*. The table assigns
the index when a `void *` is *added*. It keeps track of the last
assigned index and uses the next available larger index for a
subsequent add. Reaching *capacity* it wraps around.
The table *can not* store `NULL` values. The largest possible index
is `UINT_MAX - 1`.
The table is iterated over by asking for the *first* existing index,
meaning the smallest number that has an entry, if the table is not
empty. To get the *next* entry, one passes the index of the previous
iteration step. It does not matter if the previous index is still
in the table. Sample code for a table iteration would look like this:
```c
unsigned int mid;
void *entry;
if(Curl_uint_tbl_first(tbl, &mid, &entry)) {
do {
/* operate on entry with index mid */
}
while(Curl_uint_tbl_next(tbl, mid, &mid, &entry));
}
```
This iteration has the following properties:
* entries in the table can be added/removed safely.
* all entries that are not removed during the iteration are visited.
* the table may be resized to a larger capacity without affecting visited entries.
* entries added with a larger index than the current are visited.
### Memory
For storing 1000 entries, the table would allocate one block of 8KB on a 64-bit system,
plus the 2 pointers and 3 unsigned int in its base `struct uint_tbl`. A resize
allocates a completely new pointer array, copy the existing entries and free the previous one.
### Performance
Lookups of entries are only an index into the array, O(1) with a tiny 1. Adding
entries and iterations are more work:
1. adding an entry means "find the first free index larger than the previous assigned
one". Worst case for this is a table with only a single free index where `capacity - 1`
checks on `NULL` values would be performed, O(N). If the single free index is randomly
distributed, this would be O(N/2).
2. iterating a table scans for the first not `NULL` entry after the start index. This
makes a complete iteration O(N) work.
In the multi use case, point 1 is remedied by growing the table so that a good chunk
of free entries always exists.
Point 2 is less of an issue for a multi, since it does not really matter when the
number of transfer is relatively small. A multi managing a larger set needs to operate
event based anyway and table iterations rarely are needed.
For these reasons, the simple implementation was preferred. Should this become
a concern, there are options like "free index lists" or, alternatively, an internal
bitset that scans better.
## `uint_bset`
A bitset for unsigned integers, allowing fast add/remove operations. It is initialized
with a *capacity*, meaning it can store only the numbers in the range `[0, capacity-1]`.
It can be *resized* and safely *iterated*. `uint_bset` is designed to operate in combination with `uint_tbl`.
The bitset keeps an array of `curl_uint64_t`. The first array entry keeps the numbers 0 to 63, the
second 64 to 127 and so on. A bitset with capacity 1024 would therefore allocate an array
of 16 64-bit values (128 bytes). Operations for an unsigned int divide it by 64 for the array index and then check/set/clear the bit of the remainder.
Iterator works the same as with `uint_tbl`: ask the bitset for the *first* number present and
then use that to get the *next* higher number present. Like the table, this safe for
adds/removes and growing the set while iterating.
### Memory
The set only needs 1 bit for each possible number.
A bitset for 40000 transfers occupies 5KB of memory.
## Performance
Operations for add/remove/check are O(1). Iteration needs to scan for the next bit set. The
number of scans is small (see memory footprint) and, for checking bits, many compilers
offer primitives for special CPU instructions.
## `uint_spbset`
While the memory footprint of `uint_bset` is good, it still needs 5KB to store the single number 40000. This
is not optimal when many are needed. For example, in event based processing, each socket needs to
keep track of the transfers involved. There are many sockets potentially, but each one mostly tracks
a single transfer or few (on HTTP/2 connection borderline up to 100).
For such uses cases, the `uint_spbset` is intended: track a small number of unsigned int, potentially
rather "close" together. It keeps "chunks" with an offset and has no capacity limit.
Example: adding the number 40000 to an empty sparse bitset would have one chunk with offset 39936, keeping
track of the numbers 39936 to 40192 (a chunk has 4 64-bit values). The numbers in that range can be handled
without further allocations.
The worst case is then storing 100 numbers that lie in separate intervals. Then 100 chunks
would need to be allocated and linked, resulting in overall 4 KB of memory used.
Iterating a sparse bitset works the same as for bitset and table.
## `uint_hash`
At last, there are places in libcurl such as the HTTP/2 and HTTP/3 protocol implementations that need
to store their own data related to a transfer. `uint_hash` allows then to associate an unsigned int,
e.g. the transfer's `mid`, to their own data.

View File

@@ -0,0 +1,134 @@
<!--
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
SPDX-License-Identifier: curl
-->
# WebSocket in curl
## URL
WebSocket communication with libcurl is done by setting up a transfer to a URL
using the `ws://` or `wss://` URL schemes. The latter one being the secure
version done over HTTPS.
When using `wss://` to do WebSocket over HTTPS, the standard TLS and HTTPS
options are acknowledged for the CA, verification of server certificate etc.
WebSocket communication is done by upgrading a connection from either HTTP or
HTTPS. When given a WebSocket URL to work with, libcurl considers it a
transfer failure if the upgrade procedure fails. This means that a plain HTTP
200 response code is considered an error for this work.
## API
The WebSocket API is described in the individual man pages for the new API.
WebSocket with libcurl can be done two ways.
1. Get the WebSocket frames from the server sent to the write callback. You
can then respond with `curl_ws_send()` from within the callback (or outside
of it).
2. Set `CURLOPT_CONNECT_ONLY` to 2L (new for WebSocket), which makes libcurl
do an HTTP GET + `Upgrade:` request plus response in the
`curl_easy_perform()` call before it returns and then you can use
`curl_ws_recv()` and `curl_ws_send()` to receive and send WebSocket frames
from and to the server.
The new options to `curl_easy_setopt()`:
`CURLOPT_WS_OPTIONS` - to control specific behavior. `CURLWS_RAW_MODE` makes
libcurl provide all WebSocket traffic raw in the callback. `CURLWS_NOAUTOPONG`
disables automatic `PONG` replies.
The new function calls:
`curl_ws_recv()` - receive a WebSocket frame
`curl_ws_send()` - send a WebSocket frame
`curl_ws_meta()` - return WebSocket metadata within a write callback
## Max frame size
The current implementation only supports frame sizes up to a max (64K right
now). This is because the API delivers full frames and it then cannot manage
the full 2^63 bytes size.
If we decide we need to support (much) larger frames than 64K, we need to
adjust the API accordingly to be able to deliver partial frames in both
directions.
## Errors
If the given WebSocket URL (using `ws://` or `wss://`) fails to get upgraded
via a 101 response code and instead gets another response code back from the
HTTP server - the transfer returns `CURLE_HTTP_RETURNED_ERROR` for that
transfer. Note then that even 2xx response codes are then considered error
since it failed to provide a WebSocket transfer.
## Test suite
I looked for an existing small WebSocket server implementation with maximum
flexibility to dissect and cram into the test suite but I ended up deciding
that extending the existing test suite server sws to deal with WebSocket
might be the better way.
- This server is already integrated and working in the test suite
- We want maximum control and ability to generate broken protocol and negative
tests as well. A dumber and simpler TCP server could then be easier to
massage into this than a "proper" WebSocket server.
## Command line tool WebSocket
The plan is to make curl do WebSocket similar to telnet/nc. That part of the
work has not been started.
Ideas:
- Read stdin and send off as messages. Consider newline as end of fragment.
(default to text? offer option to set binary)
- Respond to PINGs automatically
- Issue PINGs at some default interval (option to switch off/change interval?)
- Allow `-d` to specify (initial) data to send (should the format allow for
multiple separate frames?)
- Exit after N messages received, where N can be zero.
## Future work
- Verify the Sec-WebSocket-Accept response. It requires a sha-1 function.
- Verify Sec-WebSocket-Extensions and Sec-WebSocket-Protocol in the response
- Consider a `curl_ws_poll()`
- Make sure WebSocket code paths are fuzzed
- Add client-side PING interval
- Provide option to disable PING-PONG automation
- Support compression (`CURLWS_COMPRESS`)
## Why not libWebSocket
libWebSocket is said to be a solid, fast and efficient WebSocket library with
a vast amount of users. My plan was originally to build upon it to skip having
to implement the low level parts of WebSocket myself.
Here are the reasons why I have decided to move forward with WebSocket in
curl **without using libWebSocket**:
- doxygen generated docs only makes them hard to navigate. No tutorial, no
clearly written explanatory pages for specific functions.
- seems (too) tightly integrated with a specific TLS library, while we want to
support WebSocket with whatever TLS library libcurl was already made to
work with.
- seems (too) tightly integrated with event libraries
- the references to threads and thread-pools in code and APIs indicate too
much logic for our purposes
- "bloated" - it is a *huge* library that is actually more lines of code than
libcurl itself
- WebSocket is a fairly simple protocol on the network/framing layer so
making a homegrown handling of it should be fine