un-nest curl
This commit is contained in:
176
curl-8.15.0/docs/internals/BUFQ.md
Normal file
176
curl-8.15.0/docs/internals/BUFQ.md
Normal file
@@ -0,0 +1,176 @@
|
||||
<!--
|
||||
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
|
||||
|
||||
SPDX-License-Identifier: curl
|
||||
-->
|
||||
|
||||
# bufq
|
||||
|
||||
This is an internal module for managing I/O buffers. A `bufq` can be written
|
||||
to and read from. It manages read and write positions and has a maximum size.
|
||||
|
||||
## read/write
|
||||
|
||||
Its basic read/write functions have a similar signature and return code
|
||||
handling as many internal curl read and write ones.
|
||||
|
||||
|
||||
```
|
||||
ssize_t Curl_bufq_write(struct bufq *q, const unsigned char *buf, size_t len, CURLcode *err);
|
||||
|
||||
- returns the length written into `q` or -1 on error.
|
||||
- writing to a full `q` returns -1 and set *err to CURLE_AGAIN
|
||||
|
||||
ssize_t Curl_bufq_read(struct bufq *q, unsigned char *buf, size_t len, CURLcode *err);
|
||||
|
||||
- returns the length read from `q` or -1 on error.
|
||||
- reading from an empty `q` returns -1 and set *err to CURLE_AGAIN
|
||||
|
||||
```
|
||||
|
||||
To pass data into a `bufq` without an extra copy, read callbacks can be used.
|
||||
|
||||
```
|
||||
typedef ssize_t Curl_bufq_reader(void *reader_ctx, unsigned char *buf, size_t len,
|
||||
CURLcode *err);
|
||||
|
||||
ssize_t Curl_bufq_slurp(struct bufq *q, Curl_bufq_reader *reader, void *reader_ctx,
|
||||
CURLcode *err);
|
||||
```
|
||||
|
||||
`Curl_bufq_slurp()` invokes the given `reader` callback, passing it its own
|
||||
internal buffer memory to write to. It may invoke the `reader` several times,
|
||||
as long as it has space and while the `reader` always returns the length that
|
||||
was requested. There are variations of `slurp` that call the `reader` at most
|
||||
once or only read in a maximum amount of bytes.
|
||||
|
||||
The analog mechanism for write out buffer data is:
|
||||
|
||||
```
|
||||
typedef ssize_t Curl_bufq_writer(void *writer_ctx, const unsigned char *buf, size_t len,
|
||||
CURLcode *err);
|
||||
|
||||
ssize_t Curl_bufq_pass(struct bufq *q, Curl_bufq_writer *writer, void *writer_ctx,
|
||||
CURLcode *err);
|
||||
```
|
||||
|
||||
`Curl_bufq_pass()` invokes the `writer`, passing its internal memory and
|
||||
remove the amount that `writer` reports.
|
||||
|
||||
## peek and skip
|
||||
|
||||
It is possible to get access to the memory of data stored in a `bufq` with:
|
||||
|
||||
```
|
||||
bool Curl_bufq_peek(const struct bufq *q, const unsigned char **pbuf, size_t *plen);
|
||||
```
|
||||
|
||||
On returning TRUE, `pbuf` points to internal memory with `plen` bytes that one
|
||||
may read. This is only valid until another operation on `bufq` is performed.
|
||||
|
||||
Instead of reading `bufq` data, one may simply skip it:
|
||||
|
||||
```
|
||||
void Curl_bufq_skip(struct bufq *q, size_t amount);
|
||||
```
|
||||
|
||||
This removes `amount` number of bytes from the `bufq`.
|
||||
|
||||
## lifetime
|
||||
|
||||
`bufq` is initialized and freed similar to the `dynbuf` module. Code using
|
||||
`bufq` holds a `struct bufq` somewhere. Before it uses it, it invokes:
|
||||
|
||||
```
|
||||
void Curl_bufq_init(struct bufq *q, size_t chunk_size, size_t max_chunks);
|
||||
```
|
||||
|
||||
The `bufq` is told how many "chunks" of data it shall hold at maximum and how
|
||||
large those "chunks" should be. There are some variants of this, allowing for
|
||||
more options. How "chunks" are handled in a `bufq` is presented in the section
|
||||
about memory management.
|
||||
|
||||
The user of the `bufq` has the responsibility to call:
|
||||
|
||||
```
|
||||
void Curl_bufq_free(struct bufq *q);
|
||||
```
|
||||
to free all resources held by `q`. It is possible to reset a `bufq` to empty via:
|
||||
|
||||
```
|
||||
void Curl_bufq_reset(struct bufq *q);
|
||||
```
|
||||
|
||||
## memory management
|
||||
|
||||
Internally, a `bufq` uses allocation of fixed size, e.g. the "chunk_size", up
|
||||
to a maximum number, e.g. "max_chunks". These chunks are allocated on demand,
|
||||
therefore writing to a `bufq` may return `CURLE_OUT_OF_MEMORY`. Once the max
|
||||
number of chunks are used, the `bufq` reports that it is "full".
|
||||
|
||||
Each chunks has a `read` and `write` index. A `bufq` keeps its chunks in a
|
||||
list. Reading happens always at the head chunk, writing always goes to the
|
||||
tail chunk. When the head chunk becomes empty, it is removed. When the tail
|
||||
chunk becomes full, another chunk is added to the end of the list, becoming
|
||||
the new tail.
|
||||
|
||||
Chunks that are no longer used are returned to a `spare` list by default. If
|
||||
the `bufq` is created with option `BUFQ_OPT_NO_SPARES` those chunks are freed
|
||||
right away.
|
||||
|
||||
If a `bufq` is created with a `bufc_pool`, the no longer used chunks are
|
||||
returned to the pool. Also `bufq` asks the pool for a chunk when it needs one.
|
||||
More in section "pools".
|
||||
|
||||
## empty, full and overflow
|
||||
|
||||
One can ask about the state of a `bufq` with methods such as
|
||||
`Curl_bufq_is_empty(q)`, `Curl_bufq_is_full(q)`, etc. The amount of data held
|
||||
by a `bufq` is the sum of the data in all its chunks. This is what is reported
|
||||
by `Curl_bufq_len(q)`.
|
||||
|
||||
Note that a `bufq` length and it being "full" are only loosely related. A
|
||||
simple example:
|
||||
|
||||
* create a `bufq` with chunk_size=1000 and max_chunks=4.
|
||||
* write 4000 bytes to it, it reports "full"
|
||||
* read 1 bytes from it, it still reports "full"
|
||||
* read 999 more bytes from it, and it is no longer "full"
|
||||
|
||||
The reason for this is that full really means: *bufq uses max_chunks and the
|
||||
last one cannot be written to*.
|
||||
|
||||
When you read 1 byte from the head chunk in the example above, the head still
|
||||
hold 999 unread bytes. Only when those are also read, can the head chunk be
|
||||
removed and a new tail be added.
|
||||
|
||||
There is another variation to this. If you initialized a `bufq` with option
|
||||
`BUFQ_OPT_SOFT_LIMIT`, it allows writes **beyond** the `max_chunks`. It
|
||||
reports **full**, but one can **still** write. This option is necessary, if
|
||||
partial writes need to be avoided. It means that you need other checks to keep
|
||||
the `bufq` from growing ever larger and larger.
|
||||
|
||||
|
||||
## pools
|
||||
|
||||
A `struct bufc_pool` may be used to create chunks for a `bufq` and keep spare
|
||||
ones around. It is initialized and used via:
|
||||
|
||||
```
|
||||
void Curl_bufcp_init(struct bufc_pool *pool, size_t chunk_size, size_t spare_max);
|
||||
|
||||
void Curl_bufq_initp(struct bufq *q, struct bufc_pool *pool, size_t max_chunks, int opts);
|
||||
```
|
||||
|
||||
The pool gets the size and the mount of spares to keep. The `bufq` gets the
|
||||
pool and the `max_chunks`. It no longer needs to know the chunk sizes, as
|
||||
those are managed by the pool.
|
||||
|
||||
A pool can be shared between many `bufq`s, as long as all of them operate in
|
||||
the same thread. In curl that would be true for all transfers using the same
|
||||
multi handle. The advantages of a pool are:
|
||||
|
||||
* when all `bufq`s are empty, only memory for `max_spare` chunks in the pool
|
||||
is used. Empty `bufq`s holds no memory.
|
||||
* the latest spare chunk is the first to be handed out again, no matter which
|
||||
`bufq` needs it. This keeps the footprint of "recently used" memory smaller.
|
||||
86
curl-8.15.0/docs/internals/BUFREF.md
Normal file
86
curl-8.15.0/docs/internals/BUFREF.md
Normal file
@@ -0,0 +1,86 @@
|
||||
<!--
|
||||
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
|
||||
|
||||
SPDX-License-Identifier: curl
|
||||
-->
|
||||
|
||||
# bufref
|
||||
|
||||
This is an internal module for handling buffer references. A referenced
|
||||
buffer is associated with its destructor function that is implicitly called
|
||||
when the reference is invalidated. Once referenced, a buffer cannot be
|
||||
reallocated.
|
||||
|
||||
A data length is stored within the reference for binary data handling
|
||||
purposes; it is not used by the bufref API.
|
||||
|
||||
The `struct bufref` is used to hold data referencing a buffer. The members of
|
||||
that structure **MUST NOT** be accessed or modified without using the dedicated
|
||||
bufref API.
|
||||
|
||||
## `init`
|
||||
|
||||
```c
|
||||
void Curl_bufref_init(struct bufref *br);
|
||||
```
|
||||
|
||||
Initializes a `bufref` structure. This function **MUST** be called before any
|
||||
other operation is performed on the structure.
|
||||
|
||||
Upon completion, the referenced buffer is `NULL` and length is zero.
|
||||
|
||||
This function may also be called to bypass referenced buffer destruction while
|
||||
invalidating the current reference.
|
||||
|
||||
## `free`
|
||||
|
||||
```c
|
||||
void Curl_bufref_free(struct bufref *br);
|
||||
```
|
||||
|
||||
Destroys the previously referenced buffer using its destructor and
|
||||
reinitializes the structure for a possible subsequent reuse.
|
||||
|
||||
## `set`
|
||||
|
||||
```c
|
||||
void Curl_bufref_set(struct bufref *br, const void *buffer, size_t length,
|
||||
void (*destructor)(void *));
|
||||
```
|
||||
|
||||
Releases the previously referenced buffer, then assigns the new `buffer` to
|
||||
the structure, associated with its `destructor` function. The latter can be
|
||||
specified as `NULL`: this is the case when the referenced buffer is static.
|
||||
|
||||
if `buffer` is NULL, `length` must be zero.
|
||||
|
||||
## `memdup`
|
||||
|
||||
```c
|
||||
CURLcode Curl_bufref_memdup(struct bufref *br, const void *data, size_t length);
|
||||
```
|
||||
|
||||
Releases the previously referenced buffer, then duplicates the `length`-byte
|
||||
`data` into a buffer allocated via `malloc()` and references the latter
|
||||
associated with destructor `curl_free()`.
|
||||
|
||||
An additional trailing byte is allocated and set to zero as a possible string
|
||||
null-terminator; it is not counted in the stored length.
|
||||
|
||||
Returns `CURLE_OK` if successful, else `CURLE_OUT_OF_MEMORY`.
|
||||
|
||||
## `ptr`
|
||||
|
||||
```c
|
||||
const unsigned char *Curl_bufref_ptr(const struct bufref *br);
|
||||
```
|
||||
|
||||
Returns a `const unsigned char *` to the referenced buffer.
|
||||
|
||||
## `len`
|
||||
|
||||
```c
|
||||
size_t Curl_bufref_len(const struct bufref *br);
|
||||
```
|
||||
|
||||
Returns the stored length of the referenced buffer.
|
||||
190
curl-8.15.0/docs/internals/CHECKSRC.md
Normal file
190
curl-8.15.0/docs/internals/CHECKSRC.md
Normal file
@@ -0,0 +1,190 @@
|
||||
<!--
|
||||
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
|
||||
|
||||
SPDX-License-Identifier: curl
|
||||
-->
|
||||
|
||||
# checksrc
|
||||
|
||||
This is the tool we use within the curl project to scan C source code and
|
||||
check that it adheres to our [Source Code Style guide](CODE_STYLE.md).
|
||||
|
||||
## Usage
|
||||
|
||||
checksrc.pl [options] [file1] [file2] ...
|
||||
|
||||
## Command line options
|
||||
|
||||
`-W[file]` skip that file and exclude it from being checked. Helpful
|
||||
when, for example, one of the files is generated.
|
||||
|
||||
`-D[dir]` directory name to prepend to filenames when accessing them.
|
||||
|
||||
`-h` shows the help output, that also lists all recognized warnings
|
||||
|
||||
## What does `checksrc` warn for?
|
||||
|
||||
`checksrc` does not check and verify the code against the entire style guide.
|
||||
The script is an effort to detect the most common mistakes and syntax mistakes
|
||||
that contributors make before they get accustomed to our code style. Heck,
|
||||
many of us regulars do the mistakes too and this script helps us keep the code
|
||||
in shape.
|
||||
|
||||
checksrc.pl -h
|
||||
|
||||
Lists how to use the script and it lists all existing warnings it has and
|
||||
problems it detects. At the time of this writing, the existing `checksrc`
|
||||
warnings are:
|
||||
|
||||
- `ASSIGNWITHINCONDITION`: Assignment within a conditional expression. The
|
||||
code style mandates the assignment to be done outside of it.
|
||||
|
||||
- `ASTERISKNOSPACE`: A pointer was declared like `char* name` instead of the
|
||||
more appropriate `char *name` style. The asterisk should sit next to the
|
||||
name.
|
||||
|
||||
- `ASTERISKSPACE`: A pointer was declared like `char * name` instead of the
|
||||
more appropriate `char *name` style. The asterisk should sit right next to
|
||||
the name without a space in between.
|
||||
|
||||
- `BADCOMMAND`: There is a bad `checksrc` instruction in the code. See the
|
||||
**Ignore certain warnings** section below for details.
|
||||
|
||||
- `BANNEDFUNC`: A banned function was used. The functions sprintf, vsprintf,
|
||||
strcat, strncat, gets are **never** allowed in curl source code.
|
||||
|
||||
- `BRACEELSE`: '} else' on the same line. The else is supposed to be on the
|
||||
following line.
|
||||
|
||||
- `BRACEPOS`: wrong position for an open brace (`{`).
|
||||
|
||||
- `BRACEWHILE`: more than once space between end brace and while keyword
|
||||
|
||||
- `COMMANOSPACE`: a comma without following space
|
||||
|
||||
- `COPYRIGHT`: the file is missing a copyright statement
|
||||
|
||||
- `CPPCOMMENTS`: `//` comment detected, that is not C89 compliant
|
||||
|
||||
- `DOBRACE`: only use one space after do before open brace
|
||||
|
||||
- `EMPTYLINEBRACE`: found empty line before open brace
|
||||
|
||||
- `EQUALSNOSPACE`: no space after `=` sign
|
||||
|
||||
- `EQUALSNULL`: comparison with `== NULL` used in if/while. We use `!var`.
|
||||
|
||||
- `EXCLAMATIONSPACE`: space found after exclamations mark
|
||||
|
||||
- `FOPENMODE`: `fopen()` needs a macro for the mode string, use it
|
||||
|
||||
- `INDENTATION`: detected a wrong start column for code. Note that this
|
||||
warning only checks some specific places and can certainly miss many bad
|
||||
indentations.
|
||||
|
||||
- `LONGLINE`: A line is longer than 79 columns.
|
||||
|
||||
- `MULTISPACE`: Multiple spaces were found where only one should be used.
|
||||
|
||||
- `NOSPACEEQUALS`: An equals sign was found without preceding space. We prefer
|
||||
`a = 2` and *not* `a=2`.
|
||||
|
||||
- `NOTEQUALSZERO`: check found using `!= 0`. We use plain `if(var)`.
|
||||
|
||||
- `ONELINECONDITION`: do not put the conditional block on the same line as `if()`
|
||||
|
||||
- `OPENCOMMENT`: File ended with a comment (`/*`) still "open".
|
||||
|
||||
- `PARENBRACE`: `){` was used without sufficient space in between.
|
||||
|
||||
- `RETURNNOSPACE`: `return` was used without space between the keyword and the
|
||||
following value.
|
||||
|
||||
- `SEMINOSPACE`: There was no space (or newline) following a semicolon.
|
||||
|
||||
- `SIZEOFNOPAREN`: Found use of sizeof without parentheses. We prefer
|
||||
`sizeof(int)` style.
|
||||
|
||||
- `SNPRINTF` - Found use of `snprintf()`. Since we use an internal replacement
|
||||
with a different return code etc, we prefer `msnprintf()`.
|
||||
|
||||
- `SPACEAFTERPAREN`: there was a space after open parenthesis, `( text`.
|
||||
|
||||
- `SPACEBEFORECLOSE`: there was a space before a close parenthesis, `text )`.
|
||||
|
||||
- `SPACEBEFORECOMMA`: there was a space before a comma, `one , two`.
|
||||
|
||||
- `SPACEBEFOREPAREN`: there was a space before an open parenthesis, `if (`,
|
||||
where one was not expected
|
||||
|
||||
- `SPACESEMICOLON`: there was a space before semicolon, ` ;`.
|
||||
|
||||
- `TABS`: TAB characters are not allowed
|
||||
|
||||
- `TRAILINGSPACE`: Trailing whitespace on the line
|
||||
|
||||
- `TYPEDEFSTRUCT`: we frown upon (most) typedefed structs
|
||||
|
||||
- `UNUSEDIGNORE`: a `checksrc` inlined warning ignore was asked for but not
|
||||
used, that is an ignore that should be removed or changed to get used.
|
||||
|
||||
### Extended warnings
|
||||
|
||||
Some warnings are quite computationally expensive to perform, so they are
|
||||
turned off by default. To enable these warnings, place a `.checksrc` file in
|
||||
the directory where they should be activated with commands to enable the
|
||||
warnings you are interested in. The format of the file is to enable one
|
||||
warning per line like so: `enable <EXTENDEDWARNING>`
|
||||
|
||||
Currently these are the extended warnings which can be enabled:
|
||||
|
||||
- `COPYRIGHTYEAR`: the current changeset has not updated the copyright year in
|
||||
the source file
|
||||
|
||||
- `STRERROR`: use of banned function strerror()
|
||||
|
||||
- `STDERR`: use of banned variable `stderr`
|
||||
|
||||
## Ignore certain warnings
|
||||
|
||||
Due to the nature of the source code and the flaws of the `checksrc` tool,
|
||||
there is sometimes a need to ignore specific warnings. `checksrc` allows a few
|
||||
different ways to do this.
|
||||
|
||||
### Inline ignore
|
||||
|
||||
You can control what to ignore within a specific source file by providing
|
||||
instructions to `checksrc` in the source code itself. See examples below. The
|
||||
instruction can ask to ignore a specific warning a specific number of times or
|
||||
you ignore all of them until you mark the end of the ignored section.
|
||||
|
||||
Inline ignores are only done for that single specific source code file.
|
||||
|
||||
Example
|
||||
|
||||
/* !checksrc! disable LONGLINE all */
|
||||
|
||||
This ignores the warning for overly long lines until it is re-enabled with:
|
||||
|
||||
/* !checksrc! enable LONGLINE */
|
||||
|
||||
If the enabling is not performed before the end of the file, it is enabled
|
||||
again automatically for the next file.
|
||||
|
||||
You can also opt to ignore just N violations so that if you have a single long
|
||||
line you just cannot shorten and is agreed to be fine anyway:
|
||||
|
||||
/* !checksrc! disable LONGLINE 1 */
|
||||
|
||||
... and the warning for long lines is enabled again automatically after it has
|
||||
ignored that single warning. The number `1` can of course be changed to any
|
||||
other integer number. It can be used to make sure only the exact intended
|
||||
instances are ignored and nothing extra.
|
||||
|
||||
### Directory wide ignore patterns
|
||||
|
||||
This is a method we have transitioned away from. Use inline ignores as far as
|
||||
possible.
|
||||
|
||||
Make a `checksrc.skip` file in the directory of the source code with the
|
||||
false positive, and include the full offending line into this file.
|
||||
135
curl-8.15.0/docs/internals/CLIENT-READERS.md
Normal file
135
curl-8.15.0/docs/internals/CLIENT-READERS.md
Normal file
@@ -0,0 +1,135 @@
|
||||
<!--
|
||||
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
|
||||
|
||||
SPDX-License-Identifier: curl
|
||||
-->
|
||||
|
||||
# curl client readers
|
||||
|
||||
Client readers is a design in the internals of libcurl, not visible in its public API. They were started
|
||||
in curl v8.7.0. This document describes the concepts, its high level implementation and the motivations.
|
||||
|
||||
## Naming
|
||||
|
||||
`libcurl` operates between clients and servers. A *client* is the application using libcurl, like the command line tool `curl` itself. Data to be uploaded to a server is **read** from the client and **sent** to the server, the servers response is **received** by `libcurl` and then **written** to the client.
|
||||
|
||||
With this naming established, client readers are concerned with providing data from the application to the server. Applications register callbacks via `CURLOPT_READFUNCTION`, data via `CURLOPT_POSTFIELDS` and other options to be used by `libcurl` when the request is send.
|
||||
|
||||
## Invoking
|
||||
|
||||
The transfer loop that sends and receives, is using `Curl_client_read()` to get more data to send for a transfer. If no specific reader has been installed yet, the default one that uses `CURLOPT_READFUNCTION` is added. The prototype is
|
||||
|
||||
```
|
||||
CURLcode Curl_client_read(struct Curl_easy *data, char *buf, size_t blen,
|
||||
size_t *nread, bool *eos);
|
||||
```
|
||||
The arguments are the transfer to read for, a buffer to hold the read data, its length, the actual number of bytes placed into the buffer and the `eos` (*end of stream*) flag indicating that no more data is available. The `eos` flag may be set for a read amount, if that amount was the last. That way curl can avoid to read an additional time.
|
||||
|
||||
The implementation of `Curl_client_read()` uses a chain of *client reader* instances to get the data. This is similar to the design of *client writers*. The chain of readers allows processing of the data to send.
|
||||
|
||||
The definition of a reader is:
|
||||
|
||||
```
|
||||
struct Curl_crtype {
|
||||
const char *name; /* writer name. */
|
||||
CURLcode (*do_init)(struct Curl_easy *data, struct Curl_creader *writer);
|
||||
CURLcode (*do_read)(struct Curl_easy *data, struct Curl_creader *reader,
|
||||
char *buf, size_t blen, size_t *nread, bool *eos);
|
||||
void (*do_close)(struct Curl_easy *data, struct Curl_creader *reader);
|
||||
bool (*needs_rewind)(struct Curl_easy *data, struct Curl_creader *reader);
|
||||
curl_off_t (*total_length)(struct Curl_easy *data,
|
||||
struct Curl_creader *reader);
|
||||
CURLcode (*resume_from)(struct Curl_easy *data,
|
||||
struct Curl_creader *reader, curl_off_t offset);
|
||||
CURLcode (*rewind)(struct Curl_easy *data, struct Curl_creader *reader);
|
||||
};
|
||||
|
||||
struct Curl_creader {
|
||||
const struct Curl_crtype *crt; /* type implementation */
|
||||
struct Curl_creader *next; /* Downstream reader. */
|
||||
Curl_creader_phase phase; /* phase at which it operates */
|
||||
};
|
||||
```
|
||||
|
||||
`Curl_creader` is a reader instance with a `next` pointer to form the chain. It as a type `crt` which provides the implementation. The main callback is `do_read()` which provides the data to the caller. The others are for setup and tear down. `needs_rewind()` is explained further below.
|
||||
|
||||
## Phases and Ordering
|
||||
|
||||
Since client readers may transform the data being read through the chain, the order in which they are called is relevant for the outcome. When a reader is created, it gets the `phase` property in which it operates. Reader phases are defined like:
|
||||
|
||||
```
|
||||
typedef enum {
|
||||
CURL_CR_NET, /* data send to the network (connection filters) */
|
||||
CURL_CR_TRANSFER_ENCODE, /* add transfer-encodings */
|
||||
CURL_CR_PROTOCOL, /* before transfer, but after content decoding */
|
||||
CURL_CR_CONTENT_ENCODE, /* add content-encodings */
|
||||
CURL_CR_CLIENT /* data read from client */
|
||||
} Curl_creader_phase;
|
||||
```
|
||||
|
||||
If a reader for phase `PROTOCOL` is added to the chain, it is always added *after* any `NET` or `TRANSFER_ENCODE` readers and *before* and `CONTENT_ENCODE` and `CLIENT` readers. If there is already a reader for the same phase, the new reader is added before the existing one(s).
|
||||
|
||||
### Example: `chunked` reader
|
||||
|
||||
In `http_chunks.c` a client reader for chunked uploads is implemented. This one operates at phase `CURL_CR_TRANSFER_ENCODE`. Any data coming from the reader "below" has the HTTP/1.1 chunk handling applied and returned to the caller.
|
||||
|
||||
When this reader sees an `eos` from below, it generates the terminal chunk, adding trailers if provided by the application. When that last chunk is fully returned, it also sets `eos` to the caller.
|
||||
|
||||
### Example: `lineconv` reader
|
||||
|
||||
In `sendf.c` a client reader that does line-end conversions is implemented. It operates at `CURL_CR_CONTENT_ENCODE` and converts any "\n" to "\r\n". This is used for FTP ASCII uploads or when the general `crlf` options has been set.
|
||||
|
||||
### Example: `null` reader
|
||||
|
||||
Implemented in `sendf.c` for phase `CURL_CR_CLIENT`, this reader has the simple job of providing transfer bytes of length 0 to the caller, immediately indicating an `eos`. This reader is installed by HTTP for all GET/HEAD requests and when authentication is being negotiated.
|
||||
|
||||
### Example: `buf` reader
|
||||
|
||||
Implemented in `sendf.c` for phase `CURL_CR_CLIENT`, this reader get a buffer pointer and a length and provides exactly these bytes. This one is used in HTTP for sending `postfields` provided by the application.
|
||||
|
||||
## Request retries
|
||||
|
||||
Sometimes it is necessary to send a request with client data again. Transfer handling can inquire via `Curl_client_read_needs_rewind()` if a rewind (e.g. a reset of the client data) is necessary. This asks all installed readers if they need it and give `FALSE` of none does.
|
||||
|
||||
## Upload Size
|
||||
|
||||
Many protocols need to know the amount of bytes delivered by the client readers in advance. They may invoke `Curl_creader_total_length(data)` to retrieve that. However, not all reader chains know the exact value beforehand. In that case, the call returns `-1` for "unknown".
|
||||
|
||||
Even if the length of the "raw" data is known, the length that is send may not. Example: with option `--crlf` the uploaded content undergoes line-end conversion. The line converting reader does not know in advance how many newlines it may encounter. Therefore it must return `-1` for any positive raw content length.
|
||||
|
||||
In HTTP, once the correct client readers are installed, the protocol asks the readers for the total length. If that is known, it can set `Content-Length:` accordingly. If not, it may choose to add an HTTP "chunked" reader.
|
||||
|
||||
In addition, there is `Curl_creader_client_length(data)` which gives the total length as reported by the reader in phase `CURL_CR_CLIENT` without asking other readers that may transform the raw data. This is useful in estimating the size of an upload. The HTTP protocol uses this to determine if `Expect: 100-continue` shall be done.
|
||||
|
||||
## Resuming
|
||||
|
||||
Uploads can start at a specific offset, if so requested. The "resume from" that offset. This applies to the reader in phase `CURL_CR_CLIENT` that delivers the "raw" content. Resumption can fail if the installed reader does not support it or if the offset is too large.
|
||||
|
||||
The total length reported by the reader changes when resuming. Example: resuming an upload of 100 bytes by 25 reports a total length of 75 afterwards.
|
||||
|
||||
If `resume_from()` is invoked twice, it is additive. There is currently no way to undo a resume.
|
||||
|
||||
## Rewinding
|
||||
|
||||
When a request is retried, installed client readers are discarded and replaced by new ones. This works only if the new readers upload the same data. For many readers, this is not an issue. The "null" reader always does the same. Also the `buf` reader, initialized with the same buffer, does this.
|
||||
|
||||
Readers operating on callbacks to the application need to "rewind" the underlying content. For example, when reading from a `FILE*`, the reader needs to `fseek()` to the beginning. The following methods are used:
|
||||
|
||||
1. `Curl_creader_needs_rewind(data)`: tells if a rewind is necessary, given the current state of the reader chain. If nothing really has been read so far, this returns `FALSE`.
|
||||
2. `Curl_creader_will_rewind(data)`: tells if the reader chain rewinds at the start of the next request.
|
||||
3. `Curl_creader_set_rewind(data, TRUE)`: marks the reader chain for rewinding at the start of the next request.
|
||||
4. `Curl_client_start(data)`: tells the readers that a new request starts and they need to rewind if requested.
|
||||
|
||||
|
||||
## Summary and Outlook
|
||||
|
||||
By adding the client reader interface, any protocol can control how/if it wants the curl transfer to send bytes for a request. The transfer loop becomes then blissfully ignorant of the specifics.
|
||||
|
||||
The protocols on the other hand no longer have to care to package data most efficiently. At any time, should more data be needed, it can be read from the client. This is used when sending HTTP requests headers to add as much request body data to the initial sending as there is room for.
|
||||
|
||||
Future enhancements based on the client readers:
|
||||
* `expect-100` handling: place that into an HTTP specific reader at
|
||||
`CURL_CR_PROTOCOL` and eliminate the checks in the generic transfer parts.
|
||||
* `eos forwarding`: transfer should forward an `eos` flag to the connection
|
||||
filters. Filters like HTTP/2 and HTTP/3 can make use of that, terminating
|
||||
streams early. This would also eliminate length checks in stream handling.
|
||||
123
curl-8.15.0/docs/internals/CLIENT-WRITERS.md
Normal file
123
curl-8.15.0/docs/internals/CLIENT-WRITERS.md
Normal file
@@ -0,0 +1,123 @@
|
||||
<!--
|
||||
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
|
||||
|
||||
SPDX-License-Identifier: curl
|
||||
-->
|
||||
|
||||
# curl client writers
|
||||
|
||||
Client writers is a design in the internals of libcurl, not visible in its public API. They were started
|
||||
in curl v8.5.0. This document describes the concepts, its high level implementation and the motivations.
|
||||
|
||||
## Naming
|
||||
|
||||
`libcurl` operates between clients and servers. A *client* is the application using libcurl, like the command line tool `curl` itself. Data to be uploaded to a server is **read** from the client and **send** to the server, the servers response is **received** by `libcurl` and then **written** to the client.
|
||||
|
||||
With this naming established, client writers are concerned with writing responses from the server to the application. Applications register callbacks via `CURLOPT_WRITEFUNCTION` and `CURLOPT_HEADERFUNCTION` to be invoked by `libcurl` when the response is received.
|
||||
|
||||
## Invoking
|
||||
|
||||
All code in `libcurl` that handles response data is ultimately expected to forward this data via `Curl_client_write()` to the application. The exact prototype of this function is:
|
||||
|
||||
```
|
||||
CURLcode Curl_client_write(struct Curl_easy *data, int type, const char *buf, size_t blen);
|
||||
```
|
||||
The `type` argument specifies what the bytes in `buf` actually are. The following bits are defined:
|
||||
|
||||
```
|
||||
#define CLIENTWRITE_BODY (1<<0) /* non-meta information, BODY */
|
||||
#define CLIENTWRITE_INFO (1<<1) /* meta information, not a HEADER */
|
||||
#define CLIENTWRITE_HEADER (1<<2) /* meta information, HEADER */
|
||||
#define CLIENTWRITE_STATUS (1<<3) /* a special status HEADER */
|
||||
#define CLIENTWRITE_CONNECT (1<<4) /* a CONNECT related HEADER */
|
||||
#define CLIENTWRITE_1XX (1<<5) /* a 1xx response related HEADER */
|
||||
#define CLIENTWRITE_TRAILER (1<<6) /* a trailer HEADER */
|
||||
```
|
||||
|
||||
The main types here are `CLIENTWRITE_BODY` and `CLIENTWRITE_HEADER`. They are
|
||||
mutually exclusive. The other bits are enhancements to `CLIENTWRITE_HEADER` to
|
||||
specify what the header is about. They are only used in HTTP and related
|
||||
protocols (RTSP and WebSocket).
|
||||
|
||||
The implementation of `Curl_client_write()` uses a chain of *client writer* instances to process the call and make sure that the bytes reach the proper application callbacks. This is similar to the design of connection filters: client writers can be chained to process the bytes written through them. The definition is:
|
||||
|
||||
```
|
||||
struct Curl_cwtype {
|
||||
const char *name;
|
||||
CURLcode (*do_init)(struct Curl_easy *data,
|
||||
struct Curl_cwriter *writer);
|
||||
CURLcode (*do_write)(struct Curl_easy *data,
|
||||
struct Curl_cwriter *writer, int type,
|
||||
const char *buf, size_t nbytes);
|
||||
void (*do_close)(struct Curl_easy *data,
|
||||
struct Curl_cwriter *writer);
|
||||
};
|
||||
|
||||
struct Curl_cwriter {
|
||||
const struct Curl_cwtype *cwt; /* type implementation */
|
||||
struct Curl_cwriter *next; /* Downstream writer. */
|
||||
Curl_cwriter_phase phase; /* phase at which it operates */
|
||||
};
|
||||
```
|
||||
|
||||
`Curl_cwriter` is a writer instance with a `next` pointer to form the chain. It has a type `cwt` which provides the implementation. The main callback is `do_write()` that processes the data and calls then the `next` writer. The others are for setup and tear down.
|
||||
|
||||
## Phases and Ordering
|
||||
|
||||
Since client writers may transform the bytes written through them, the order in which the are called is relevant for the outcome. When a writer is created, one property it gets is the `phase` in which it operates. Writer phases are defined like:
|
||||
|
||||
```
|
||||
typedef enum {
|
||||
CURL_CW_RAW, /* raw data written, before any decoding */
|
||||
CURL_CW_TRANSFER_DECODE, /* remove transfer-encodings */
|
||||
CURL_CW_PROTOCOL, /* after transfer, but before content decoding */
|
||||
CURL_CW_CONTENT_DECODE, /* remove content-encodings */
|
||||
CURL_CW_CLIENT /* data written to client */
|
||||
} Curl_cwriter_phase;
|
||||
```
|
||||
|
||||
If a writer for phase `PROTOCOL` is added to the chain, it is always added *after* any `RAW` or `TRANSFER_DECODE` and *before* any `CONTENT_DECODE` and `CLIENT` phase writer. If there is already a writer for the same phase present, the new writer is inserted just before that one.
|
||||
|
||||
All transfers have a chain of 3 writers by default. A specific protocol handler may alter that by adding additional writers. The 3 standard writers are (name, phase):
|
||||
|
||||
1. `"raw", CURL_CW_RAW `: if the transfer is verbose, it forwards the body data to the debug function.
|
||||
1. `"download", CURL_CW_PROTOCOL`: checks that protocol limits are kept and updates progress counters. When a download has a known length, it checks that it is not exceeded and errors otherwise.
|
||||
1. `"client", CURL_CW_CLIENT`: the main work horse. It invokes the application callbacks or writes to the configured file handles. It chops large writes into smaller parts, as documented for `CURLOPT_WRITEFUNCTION`. If also handles *pausing* of transfers when the application callback returns `CURL_WRITEFUNC_PAUSE`.
|
||||
|
||||
With these writers always in place, libcurl's protocol handlers automatically have these implemented.
|
||||
|
||||
## Enhanced Use
|
||||
|
||||
HTTP is the protocol in curl that makes use of the client writer chain by
|
||||
adding writers to it. When the `libcurl` application set
|
||||
`CURLOPT_ACCEPT_ENCODING` (as `curl` does with `--compressed`), the server is
|
||||
offered an `Accept-Encoding` header with the algorithms supported. The server
|
||||
then may choose to send the response body compressed. For example using `gzip`
|
||||
or `brotli` or even both.
|
||||
|
||||
In the server's response, if there is a `Content-Encoding` header listing the
|
||||
encoding applied. If supported by `libcurl` it then decompresses the content
|
||||
before writing it out to the client. How does it do that?
|
||||
|
||||
The HTTP protocol adds client writers in phase `CURL_CW_CONTENT_DECODE` on
|
||||
seeing such a header. For each encoding listed, it adds the corresponding
|
||||
writer. The response from the server is then passed through
|
||||
`Curl_client_write()` to the writers that decode it. If several encodings had
|
||||
been applied the writer chain decodes them in the proper order.
|
||||
|
||||
When the server provides a `Content-Length` header, that value applies to the
|
||||
*compressed* content. Length checks on the response bytes must happen *before*
|
||||
it gets decoded. That is why this check happens in phase `CURL_CW_PROTOCOL`
|
||||
which always is ordered before writers in phase `CURL_CW_CONTENT_DECODE`.
|
||||
|
||||
What else?
|
||||
|
||||
Well, HTTP servers may also apply a `Transfer-Encoding` to the body of a response. The most well-known one is `chunked`, but algorithms like `gzip` and friends could also be applied. The difference to content encodings is that decoding needs to happen *before* protocol checks, for example on length, are done.
|
||||
|
||||
That is why transfer decoding writers are added for phase `CURL_CW_TRANSFER_DECODE`. Which makes their operation happen *before* phase `CURL_CW_PROTOCOL` where length may be checked.
|
||||
|
||||
## Summary
|
||||
|
||||
By adding the common behavior of all protocols into `Curl_client_write()` we make sure that they do apply everywhere. Protocol handler have less to worry about. Changes to default behavior can be done without affecting handler implementations.
|
||||
|
||||
Having a writer chain as implementation allows protocol handlers with extra needs, like HTTP, to add to this for special behavior. The common way of writing the actual response data stays the same.
|
||||
363
curl-8.15.0/docs/internals/CODE_STYLE.md
Normal file
363
curl-8.15.0/docs/internals/CODE_STYLE.md
Normal file
@@ -0,0 +1,363 @@
|
||||
<!--
|
||||
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
|
||||
|
||||
SPDX-License-Identifier: curl
|
||||
-->
|
||||
|
||||
# curl C code style
|
||||
|
||||
Source code that has a common style is easier to read than code that uses
|
||||
different styles in different places. It helps making the code feel like one
|
||||
single code base. Easy-to-read is an important property of code and helps
|
||||
making it easier to review when new things are added and it helps debugging
|
||||
code when developers are trying to figure out why things go wrong. A unified
|
||||
style is more important than individual contributors having their own personal
|
||||
tastes satisfied.
|
||||
|
||||
Our C code has a few style rules. Most of them are verified and upheld by the
|
||||
`scripts/checksrc.pl` script. Invoked with `make checksrc` or even by default
|
||||
by the build system when built after `./configure --enable-debug` has been
|
||||
used.
|
||||
|
||||
It is normally not a problem for anyone to follow the guidelines, as you just
|
||||
need to copy the style already used in the source code and there are no
|
||||
particularly unusual rules in our set of rules.
|
||||
|
||||
We also work hard on writing code that are warning-free on all the major
|
||||
platforms and in general on as many platforms as possible. Code that obviously
|
||||
causes warnings is not accepted as-is.
|
||||
|
||||
## Readability
|
||||
|
||||
A primary characteristic for code is readability. The intent and meaning of
|
||||
the code should be visible to the reader. Being clear and unambiguous beats
|
||||
being clever and saving two lines of code. Write simple code. You and others
|
||||
who come back to this code over the coming decades want to be able to quickly
|
||||
understand it when debugging.
|
||||
|
||||
## Naming
|
||||
|
||||
Try using a non-confusing naming scheme for your new functions and variable
|
||||
names. It does not necessarily have to mean that you should use the same as in
|
||||
other places of the code, just that the names should be logical,
|
||||
understandable and be named according to what they are used for. File-local
|
||||
functions should be made static. We like lower case names.
|
||||
|
||||
See the [INTERNALS](https://curl.se/dev/internals.html#symbols) document on
|
||||
how we name non-exported library-global symbols.
|
||||
|
||||
## Indenting
|
||||
|
||||
We use only spaces for indentation, never TABs. We use two spaces for each new
|
||||
open brace.
|
||||
|
||||
```c
|
||||
if(something_is_true) {
|
||||
while(second_statement == fine) {
|
||||
moo();
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Comments
|
||||
|
||||
Since we write C89 code, **//** comments are not allowed. They were not
|
||||
introduced in the C standard until C99. We use only __/* comments */__.
|
||||
|
||||
```c
|
||||
/* this is a comment */
|
||||
```
|
||||
|
||||
## Long lines
|
||||
|
||||
Source code in curl may never be wider than 79 columns and there are two
|
||||
reasons for maintaining this even in the modern era of large and high
|
||||
resolution screens:
|
||||
|
||||
1. Narrower columns are easier to read than wide ones. There is a reason
|
||||
newspapers have used columns for decades or centuries.
|
||||
|
||||
2. Narrower columns allow developers to easier show multiple pieces of code
|
||||
next to each other in different windows. It allows two or three source
|
||||
code windows next to each other on the same screen - as well as multiple
|
||||
terminal and debugging windows.
|
||||
|
||||
## Braces
|
||||
|
||||
In if/while/do/for expressions, we write the open brace on the same line as
|
||||
the keyword and we then set the closing brace on the same indentation level as
|
||||
the initial keyword. Like this:
|
||||
|
||||
```c
|
||||
if(age < 40) {
|
||||
/* clearly a youngster */
|
||||
}
|
||||
```
|
||||
|
||||
You may omit the braces if they would contain only a one-line statement:
|
||||
|
||||
```c
|
||||
if(!x)
|
||||
continue;
|
||||
```
|
||||
|
||||
For functions the opening brace should be on a separate line:
|
||||
|
||||
```c
|
||||
int main(int argc, char **argv)
|
||||
{
|
||||
return 1;
|
||||
}
|
||||
```
|
||||
|
||||
## 'else' on the following line
|
||||
|
||||
When adding an **else** clause to a conditional expression using braces, we
|
||||
add it on a new line after the closing brace. Like this:
|
||||
|
||||
```c
|
||||
if(age < 40) {
|
||||
/* clearly a youngster */
|
||||
}
|
||||
else {
|
||||
/* probably grumpy */
|
||||
}
|
||||
```
|
||||
|
||||
## No space before parentheses
|
||||
|
||||
When writing expressions using if/while/do/for, there shall be no space
|
||||
between the keyword and the open parenthesis. Like this:
|
||||
|
||||
```c
|
||||
while(1) {
|
||||
/* loop forever */
|
||||
}
|
||||
```
|
||||
|
||||
## Use boolean conditions
|
||||
|
||||
Rather than test a conditional value such as a bool against TRUE or FALSE, a
|
||||
pointer against NULL or != NULL and an int against zero or not zero in
|
||||
if/while conditions we prefer:
|
||||
|
||||
```c
|
||||
result = do_something();
|
||||
if(!result) {
|
||||
/* something went wrong */
|
||||
return result;
|
||||
}
|
||||
```
|
||||
|
||||
## No assignments in conditions
|
||||
|
||||
To increase readability and reduce complexity of conditionals, we avoid
|
||||
assigning variables within if/while conditions. We frown upon this style:
|
||||
|
||||
```c
|
||||
if((ptr = malloc(100)) == NULL)
|
||||
return NULL;
|
||||
```
|
||||
|
||||
and instead we encourage the above version to be spelled out more clearly:
|
||||
|
||||
```c
|
||||
ptr = malloc(100);
|
||||
if(!ptr)
|
||||
return NULL;
|
||||
```
|
||||
|
||||
## New block on a new line
|
||||
|
||||
We never write multiple statements on the same source line, even for short
|
||||
if() conditions.
|
||||
|
||||
```c
|
||||
if(a)
|
||||
return TRUE;
|
||||
else if(b)
|
||||
return FALSE;
|
||||
```
|
||||
|
||||
and NEVER:
|
||||
|
||||
```c
|
||||
if(a) return TRUE;
|
||||
else if(b) return FALSE;
|
||||
```
|
||||
|
||||
## Space around operators
|
||||
|
||||
Please use spaces on both sides of operators in C expressions. Postfix **(),
|
||||
[], ->, ., ++, --** and Unary **+, -, !, ~, &** operators excluded they should
|
||||
have no space.
|
||||
|
||||
Examples:
|
||||
|
||||
```c
|
||||
bla = func();
|
||||
who = name[0];
|
||||
age += 1;
|
||||
true = !false;
|
||||
size += -2 + 3 * (a + b);
|
||||
ptr->member = a++;
|
||||
struct.field = b--;
|
||||
ptr = &address;
|
||||
contents = *pointer;
|
||||
complement = ~bits;
|
||||
empty = (!*string) ? TRUE : FALSE;
|
||||
```
|
||||
|
||||
## No parentheses for return values
|
||||
|
||||
We use the 'return' statement without extra parentheses around the value:
|
||||
|
||||
```c
|
||||
int works(void)
|
||||
{
|
||||
return TRUE;
|
||||
}
|
||||
```
|
||||
|
||||
## Parentheses for sizeof arguments
|
||||
|
||||
When using the sizeof operator in code, we prefer it to be written with
|
||||
parentheses around its argument:
|
||||
|
||||
```c
|
||||
int size = sizeof(int);
|
||||
```
|
||||
|
||||
## Column alignment
|
||||
|
||||
Some statements cannot be completed on a single line because the line would be
|
||||
too long, the statement too hard to read, or due to other style guidelines
|
||||
above. In such a case the statement spans multiple lines.
|
||||
|
||||
If a continuation line is part of an expression or sub-expression then you
|
||||
should align on the appropriate column so that it is easy to tell what part of
|
||||
the statement it is. Operators should not start continuation lines. In other
|
||||
cases follow the 2-space indent guideline. Here are some examples from
|
||||
libcurl:
|
||||
|
||||
```c
|
||||
if(Curl_pipeline_wanted(handle->multi, CURLPIPE_HTTP1) &&
|
||||
(handle->set.httpversion != CURL_HTTP_VERSION_1_0) &&
|
||||
(handle->set.httpreq == HTTPREQ_GET ||
|
||||
handle->set.httpreq == HTTPREQ_HEAD))
|
||||
/* did not ask for HTTP/1.0 and a GET or HEAD */
|
||||
return TRUE;
|
||||
```
|
||||
|
||||
If no parenthesis, use the default indent:
|
||||
|
||||
```c
|
||||
data->set.http_disable_hostname_check_before_authentication =
|
||||
(0 != va_arg(param, long)) ? TRUE : FALSE;
|
||||
```
|
||||
|
||||
Function invoke with an open parenthesis:
|
||||
|
||||
```c
|
||||
if(option) {
|
||||
result = parse_login_details(option, strlen(option),
|
||||
(userp ? &user : NULL),
|
||||
(passwdp ? &passwd : NULL),
|
||||
NULL);
|
||||
}
|
||||
```
|
||||
|
||||
Align with the "current open" parenthesis:
|
||||
|
||||
```c
|
||||
DEBUGF(infof(data, "Curl_pp_readresp_ %d bytes of trailing "
|
||||
"server response left\n",
|
||||
(int)clipamount));
|
||||
```
|
||||
|
||||
## Platform dependent code
|
||||
|
||||
Use **#ifdef HAVE_FEATURE** to do conditional code. We avoid checking for
|
||||
particular operating systems or hardware in the #ifdef lines. The HAVE_FEATURE
|
||||
shall be generated by the configure script for Unix-like systems and they are
|
||||
hard-coded in the `config-[system].h` files for the others.
|
||||
|
||||
We also encourage use of macros/functions that possibly are empty or defined
|
||||
to constants when libcurl is built without that feature, to make the code
|
||||
seamless. Like this example where the **magic()** function works differently
|
||||
depending on a build-time conditional:
|
||||
|
||||
```c
|
||||
#ifdef HAVE_MAGIC
|
||||
void magic(int a)
|
||||
{
|
||||
return a + 2;
|
||||
}
|
||||
#else
|
||||
#define magic(x) 1
|
||||
#endif
|
||||
|
||||
int content = magic(3);
|
||||
```
|
||||
|
||||
## No typedefed structs
|
||||
|
||||
Use structs by all means, but do not typedef them. Use the `struct name` way
|
||||
of identifying them:
|
||||
|
||||
```c
|
||||
struct something {
|
||||
void *valid;
|
||||
size_t way_to_write;
|
||||
};
|
||||
struct something instance;
|
||||
```
|
||||
|
||||
**Not okay**:
|
||||
|
||||
```c
|
||||
typedef struct {
|
||||
void *wrong;
|
||||
size_t way_to_write;
|
||||
} something;
|
||||
something instance;
|
||||
```
|
||||
|
||||
## Banned functions
|
||||
|
||||
To avoid footguns and unintended consequences we forbid the use of a number of
|
||||
C functions. The `checksrc` script finds and yells about them if used. This
|
||||
makes us write better code.
|
||||
|
||||
This is the full list of functions generally banned.
|
||||
|
||||
_access
|
||||
_mbscat
|
||||
_mbsncat
|
||||
_tcscat
|
||||
_tcsncat
|
||||
_waccess
|
||||
_wcscat
|
||||
_wcsncat
|
||||
access
|
||||
gets
|
||||
gmtime
|
||||
LoadLibrary
|
||||
LoadLibraryA
|
||||
LoadLibraryEx
|
||||
LoadLibraryExA
|
||||
LoadLibraryExW
|
||||
LoadLibraryW
|
||||
localtime
|
||||
snprintf
|
||||
sprintf
|
||||
sscanf
|
||||
strcat
|
||||
strerror
|
||||
strncat
|
||||
strncpy
|
||||
strtok
|
||||
strtol
|
||||
strtoul
|
||||
vsnprint
|
||||
vsprintf
|
||||
308
curl-8.15.0/docs/internals/CONNECTION-FILTERS.md
Normal file
308
curl-8.15.0/docs/internals/CONNECTION-FILTERS.md
Normal file
@@ -0,0 +1,308 @@
|
||||
<!--
|
||||
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
|
||||
|
||||
SPDX-License-Identifier: curl
|
||||
-->
|
||||
|
||||
# curl connection filters
|
||||
|
||||
Connection filters is a design in the internals of curl, not visible in its
|
||||
public API. They were added in curl v7.87.0. This document describes the
|
||||
concepts, its high level implementation and the motivations.
|
||||
|
||||
## Filters
|
||||
|
||||
A "connection filter" is a piece of code that is responsible for handling a
|
||||
range of operations of curl's connections: reading, writing, waiting on
|
||||
external events, connecting and closing down - to name the most important
|
||||
ones.
|
||||
|
||||
The most important feat of connection filters is that they can be stacked on
|
||||
top of each other (or "chained" if you prefer that metaphor). In the common
|
||||
scenario that you want to retrieve a `https:` URL with curl, you need 2 basic
|
||||
things to send the request and get the response: a TCP connection, represented
|
||||
by a `socket` and an SSL instance en- and decrypt over that socket. You write
|
||||
your request to the SSL instance, which encrypts and writes that data to the
|
||||
socket, which then sends the bytes over the network.
|
||||
|
||||
With connection filters, curl's internal setup looks something like this (cf
|
||||
for connection filter):
|
||||
|
||||
```
|
||||
Curl_easy *data connectdata *conn cf-ssl cf-socket
|
||||
+----------------+ +-----------------+ +-------+ +--------+
|
||||
|https://curl.se/|----> | properties |----> | keys |---> | socket |--> OS --> network
|
||||
+----------------+ +-----------------+ +-------+ +--------+
|
||||
|
||||
Curl_write(data, buffer)
|
||||
--> Curl_cfilter_write(data, data->conn, buffer)
|
||||
---> conn->filter->write(conn->filter, data, buffer)
|
||||
```
|
||||
|
||||
While connection filters all do different things, they look the same from the
|
||||
"outside". The code in `data` and `conn` does not really know **which**
|
||||
filters are installed. `conn` just writes into the first filter, whatever that
|
||||
is.
|
||||
|
||||
Same is true for filters. Each filter has a pointer to the `next` filter. When
|
||||
SSL has encrypted the data, it does not write to a socket, it writes to the
|
||||
next filter. If that is indeed a socket, or a file, or an HTTP/2 connection is
|
||||
of no concern to the SSL filter.
|
||||
|
||||
This allows stacking, as in:
|
||||
|
||||
```
|
||||
Direct:
|
||||
http://localhost/ conn -> cf-socket
|
||||
https://curl.se/ conn -> cf-ssl -> cf-socket
|
||||
Via http proxy tunnel:
|
||||
http://localhost/ conn -> cf-http-proxy -> cf-socket
|
||||
https://curl.se/ conn -> cf-ssl -> cf-http-proxy -> cf-socket
|
||||
Via https proxy tunnel:
|
||||
http://localhost/ conn -> cf-http-proxy -> cf-ssl -> cf-socket
|
||||
https://curl.se/ conn -> cf-ssl -> cf-http-proxy -> cf-ssl -> cf-socket
|
||||
Via http proxy tunnel via SOCKS proxy:
|
||||
http://localhost/ conn -> cf-http-proxy -> cf-socks -> cf-socket
|
||||
```
|
||||
|
||||
### Connecting/Closing
|
||||
|
||||
Before `Curl_easy` can send the request, the connection needs to be
|
||||
established. This means that all connection filters have done, whatever they
|
||||
need to do: waiting for the socket to be connected, doing the TLS handshake,
|
||||
performing the HTTP tunnel request, etc. This has to be done in reverse order:
|
||||
the last filter has to do its connect first, then the one above can start,
|
||||
etc.
|
||||
|
||||
Each filter does in principle the following:
|
||||
|
||||
```
|
||||
static CURLcode
|
||||
myfilter_cf_connect(struct Curl_cfilter *cf,
|
||||
struct Curl_easy *data,
|
||||
bool *done)
|
||||
{
|
||||
CURLcode result;
|
||||
|
||||
if(cf->connected) { /* we and all below are done */
|
||||
*done = TRUE;
|
||||
return CURLE_OK;
|
||||
}
|
||||
/* Let the filters below connect */
|
||||
result = cf->next->cft->connect(cf->next, data, blocking, done);
|
||||
if(result || !*done)
|
||||
return result; /* below errored/not finished yet */
|
||||
|
||||
/* MYFILTER CONNECT THINGS */ /* below connected, do out thing */
|
||||
*done = cf->connected = TRUE; /* done, remember, return */
|
||||
return CURLE_OK;
|
||||
}
|
||||
```
|
||||
|
||||
Closing a connection then works similar. The `conn` tells the first filter to
|
||||
close. Contrary to connecting, the filter does its own things first, before
|
||||
telling the next filter to close.
|
||||
|
||||
### Efficiency
|
||||
|
||||
There are two things curl is concerned about: efficient memory use and fast
|
||||
transfers.
|
||||
|
||||
The memory footprint of a filter is relatively small:
|
||||
|
||||
```
|
||||
struct Curl_cfilter {
|
||||
const struct Curl_cftype *cft; /* the type providing implementation */
|
||||
struct Curl_cfilter *next; /* next filter in chain */
|
||||
void *ctx; /* filter type specific settings */
|
||||
struct connectdata *conn; /* the connection this filter belongs to */
|
||||
int sockindex; /* TODO: like to get rid off this */
|
||||
BIT(connected); /* != 0 iff this filter is connected */
|
||||
};
|
||||
```
|
||||
|
||||
The filter type `cft` is a singleton, one static struct for each type of
|
||||
filter. The `ctx` is where a filter holds its specific data. That varies by
|
||||
filter type. An http-proxy filter keeps the ongoing state of the CONNECT here,
|
||||
free it after its has been established. The SSL filter keeps the `SSL*` (if
|
||||
OpenSSL is used) here until the connection is closed. So, this varies.
|
||||
|
||||
`conn` is a reference to the connection this filter belongs to, so nothing
|
||||
extra besides the pointer itself.
|
||||
|
||||
Several things, that before were kept in `struct connectdata`, now goes into
|
||||
the `filter->ctx` *when needed*. So, the memory footprint for connections that
|
||||
do *not* use an http proxy, or socks, or https is lower.
|
||||
|
||||
As to transfer efficiency, writing and reading through a filter comes at near
|
||||
zero cost *if the filter does not transform the data*. An http proxy or socks
|
||||
filter, once it is connected, just passes the calls through. Those filters
|
||||
implementations look like this:
|
||||
|
||||
```
|
||||
ssize_t Curl_cf_def_send(struct Curl_cfilter *cf, struct Curl_easy *data,
|
||||
const void *buf, size_t len, CURLcode *err)
|
||||
{
|
||||
return cf->next->cft->do_send(cf->next, data, buf, len, err);
|
||||
}
|
||||
```
|
||||
The `recv` implementation is equivalent.
|
||||
|
||||
## Filter Types
|
||||
|
||||
The currently existing filter types (curl 8.5.0) are:
|
||||
|
||||
* `TCP`, `UDP`, `UNIX`: filters that operate on a socket, providing raw I/O.
|
||||
* `SOCKET-ACCEPT`: special TCP socket that has a socket that has been
|
||||
`accept()`ed in a `listen()`
|
||||
* `SSL`: filter that applies TLS en-/decryption and handshake. Manages the
|
||||
underlying TLS backend implementation.
|
||||
* `HTTP-PROXY`, `H1-PROXY`, `H2-PROXY`: the first manages the connection to an
|
||||
HTTP proxy server and uses the other depending on which ALPN protocol has
|
||||
been negotiated.
|
||||
* `SOCKS-PROXY`: filter for the various SOCKS proxy protocol variations
|
||||
* `HAPROXY`: filter for the protocol of the same name, providing client IP
|
||||
information to a server.
|
||||
* `HTTP/2`: filter for handling multiplexed transfers over an HTTP/2
|
||||
connection
|
||||
* `HTTP/3`: filter for handling multiplexed transfers over an HTTP/3+QUIC
|
||||
connection
|
||||
* `HAPPY-EYEBALLS`: meta filter that implements IPv4/IPv6 "happy eyeballing".
|
||||
It creates up to 2 sub-filters that race each other for a connection.
|
||||
* `SETUP`: meta filter that manages the creation of sub-filter chains for a
|
||||
specific transport (e.g. TCP or QUIC).
|
||||
* `HTTPS-CONNECT`: meta filter that races a TCP+TLS and a QUIC connection
|
||||
against each other to determine if HTTP/1.1, HTTP/2 or HTTP/3 shall be used
|
||||
for a transfer.
|
||||
|
||||
Meta filters are combining other filters for a specific purpose, mostly during
|
||||
connection establishment. Other filters like `TCP`, `UDP` and `UNIX` are only
|
||||
to be found at the end of filter chains. SSL filters provide encryption, of
|
||||
course. Protocol filters change the bytes sent and received.
|
||||
|
||||
## Filter Flags
|
||||
|
||||
Filter types carry flags that inform what they do. These are (for now):
|
||||
|
||||
* `CF_TYPE_IP_CONNECT`: this filter type talks directly to a server. This does
|
||||
not have to be the server the transfer wants to talk to. For example when a
|
||||
proxy server is used.
|
||||
* `CF_TYPE_SSL`: this filter type provides encryption.
|
||||
* `CF_TYPE_MULTIPLEX`: this filter type can manage multiple transfers in parallel.
|
||||
|
||||
Filter types can combine these flags. For example, the HTTP/3 filter types
|
||||
have `CF_TYPE_IP_CONNECT`, `CF_TYPE_SSL` and `CF_TYPE_MULTIPLEX` set.
|
||||
|
||||
Flags are useful to extrapolate properties of a connection. To check if a
|
||||
connection is encrypted, libcurl inspect the filter chain in place, top down,
|
||||
for `CF_TYPE_SSL`. If it finds `CF_TYPE_IP_CONNECT` before any `CF_TYPE_SSL`,
|
||||
the connection is not encrypted.
|
||||
|
||||
For example, `conn1` is for a `http:` request using a tunnel through an HTTP/2
|
||||
`https:` proxy. `conn2` is a `https:` HTTP/2 connection to the same proxy.
|
||||
`conn3` uses HTTP/3 without proxy. The filter chains would look like this
|
||||
(simplified):
|
||||
|
||||
```
|
||||
conn1 --> `HTTP-PROXY` --> `H2-PROXY` --> `SSL` --> `TCP`
|
||||
flags: `IP_CONNECT` `SSL` `IP_CONNECT`
|
||||
|
||||
conn2 --> `HTTP/2` --> `SSL` --> `HTTP-PROXY` --> `H2-PROXY` --> `SSL` --> `TCP`
|
||||
flags: `SSL` `IP_CONNECT` `SSL` `IP_CONNECT`
|
||||
|
||||
conn3 --> `HTTP/3`
|
||||
flags: `SSL|IP_CONNECT`
|
||||
```
|
||||
|
||||
Inspecting the filter chains, `conn1` is seen as unencrypted, since it
|
||||
contains an `IP_CONNECT` filter before any `SSL`. `conn2` is clearly encrypted
|
||||
as an `SSL` flagged filter is seen first. `conn3` is also encrypted as the
|
||||
`SSL` flag is checked before the presence of `IP_CONNECT`.
|
||||
|
||||
Similar checks can determine if a connection is multiplexed or not.
|
||||
|
||||
## Filter Tracing
|
||||
|
||||
Filters may make use of special trace macros like `CURL_TRC_CF(data, cf, msg,
|
||||
...)`. With `data` being the transfer and `cf` being the filter instance.
|
||||
These traces are normally not active and their execution is guarded so that
|
||||
they are cheap to ignore.
|
||||
|
||||
Users of `curl` may activate them by adding the name of the filter type to the
|
||||
`--trace-config` argument. For example, in order to get more detailed tracing
|
||||
of an HTTP/2 request, invoke curl with:
|
||||
|
||||
```
|
||||
> curl -v --trace-config ids,time,http/2 https://curl.se
|
||||
```
|
||||
|
||||
Which gives you trace output with time information, transfer+connection ids
|
||||
and details from the `HTTP/2` filter. Filter type names in the trace config
|
||||
are case insensitive. You may use `all` to enable tracing for all filter
|
||||
types. When using `libcurl` you may call `curl_global_trace(config_string)` at
|
||||
the start of your application to enable filter details.
|
||||
|
||||
## Meta Filters
|
||||
|
||||
Meta filters is a catch-all name for filter types that do not change the
|
||||
transfer data in any way but provide other important services to curl. In
|
||||
general, it is possible to do all sorts of silly things with them. One of the
|
||||
commonly used, important things is "eyeballing".
|
||||
|
||||
The `HAPPY-EYEBALLS` filter is involved in the connect phase. Its job is to
|
||||
try the various IPv4 and IPv6 addresses that are known for a server. If only
|
||||
one address family is known (or configured), it tries the addresses one after
|
||||
the other with timeouts calculated from the amount of addresses and the
|
||||
overall connect timeout.
|
||||
|
||||
When more than one address family is to be tried, it splits the address list
|
||||
into IPv4 and IPv6 and makes parallel attempts. The connection filter chain
|
||||
looks like this:
|
||||
|
||||
```
|
||||
* create connection for http://curl.se
|
||||
conn[curl.se] --> SETUP[TCP] --> HAPPY-EYEBALLS --> NULL
|
||||
* start connect
|
||||
conn[curl.se] --> SETUP[TCP] --> HAPPY-EYEBALLS --> NULL
|
||||
- ballerv4 --> TCP[151.101.1.91]:443
|
||||
- ballerv6 --> TCP[2a04:4e42:c00::347]:443
|
||||
* v6 answers, connected
|
||||
conn[curl.se] --> SETUP[TCP] --> HAPPY-EYEBALLS --> TCP[2a04:4e42:c00::347]:443
|
||||
* transfer
|
||||
```
|
||||
|
||||
The modular design of connection filters and that we can plug them into each other is used to control the parallel attempts. When a `TCP` filter does not connect (in time), it is torn down and another one is created for the next address. This keeps the `TCP` filter simple.
|
||||
|
||||
The `HAPPY-EYEBALLS` on the other hand stays focused on its side of the problem. We can use it also to make other type of connection by just giving it another filter type to try to have happy eyeballing for QUIC:
|
||||
|
||||
```
|
||||
* create connection for --http3-only https://curl.se
|
||||
conn[curl.se] --> SETUP[QUIC] --> HAPPY-EYEBALLS --> NULL
|
||||
* start connect
|
||||
conn[curl.se] --> SETUP[QUIC] --> HAPPY-EYEBALLS --> NULL
|
||||
- ballerv4 --> HTTP/3[151.101.1.91]:443
|
||||
- ballerv6 --> HTTP/3[2a04:4e42:c00::347]:443
|
||||
* v6 answers, connected
|
||||
conn[curl.se] --> SETUP[QUIC] --> HAPPY-EYEBALLS --> HTTP/3[2a04:4e42:c00::347]:443
|
||||
* transfer
|
||||
```
|
||||
|
||||
When we plug these two variants together, we get the `HTTPS-CONNECT` filter
|
||||
type that is used for `--http3` when **both** HTTP/3 and HTTP/2 or HTTP/1.1
|
||||
shall be attempted:
|
||||
|
||||
```
|
||||
* create connection for --http3 https://curl.se
|
||||
conn[curl.se] --> HTTPS-CONNECT --> NULL
|
||||
* start connect
|
||||
conn[curl.se] --> HTTPS-CONNECT --> NULL
|
||||
- SETUP[QUIC] --> HAPPY-EYEBALLS --> NULL
|
||||
- ballerv4 --> HTTP/3[151.101.1.91]:443
|
||||
- ballerv6 --> HTTP/3[2a04:4e42:c00::347]:443
|
||||
- SETUP[TCP] --> HAPPY-EYEBALLS --> NULL
|
||||
- ballerv4 --> TCP[151.101.1.91]:443
|
||||
- ballerv6 --> TCP[2a04:4e42:c00::347]:443
|
||||
* v4 QUIC answers, connected
|
||||
conn[curl.se] --> HTTPS-CONNECT --> SETUP[QUIC] --> HAPPY-EYEBALLS --> HTTP/3[151.101.1.91]:443
|
||||
* transfer
|
||||
```
|
||||
24
curl-8.15.0/docs/internals/CURLX.md
Normal file
24
curl-8.15.0/docs/internals/CURLX.md
Normal file
@@ -0,0 +1,24 @@
|
||||
<!--
|
||||
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
|
||||
|
||||
SPDX-License-Identifier: curl
|
||||
-->
|
||||
|
||||
# `curlx`
|
||||
|
||||
Functions that are prefixed with `curlx_` are internal global functions that
|
||||
are written in a way to allow them to be "borrowed" and used outside of the
|
||||
library: in the curl tool and in the curl test suite.
|
||||
|
||||
The `curlx` functions are not part of the libcurl API, but are stand-alone
|
||||
functions whose sources can be built and used outside of libcurl. There are
|
||||
not API or ABI guarantees. The functions are not written or meant to be used
|
||||
outside of the curl project.
|
||||
|
||||
Only functions actually used by the library are provided here.
|
||||
|
||||
## Ways to success
|
||||
|
||||
- Do not use `struct Curl_easy` in these files
|
||||
- Do not use the printf defines in these files
|
||||
- Make them as stand-alone as possible
|
||||
145
curl-8.15.0/docs/internals/DYNBUF.md
Normal file
145
curl-8.15.0/docs/internals/DYNBUF.md
Normal file
@@ -0,0 +1,145 @@
|
||||
<!--
|
||||
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
|
||||
|
||||
SPDX-License-Identifier: curl
|
||||
-->
|
||||
|
||||
# dynbuf
|
||||
|
||||
This is the internal module for creating and handling "dynamic buffers". This
|
||||
means buffers that can be appended to, dynamically and grow to adapt.
|
||||
|
||||
There is always a terminating zero put at the end of the dynamic buffer.
|
||||
|
||||
The `struct dynbuf` is used to hold data for each instance of a dynamic
|
||||
buffer. The members of that struct **MUST NOT** be accessed or modified
|
||||
without using the dedicated dynbuf API.
|
||||
|
||||
## `curlx_dyn_init`
|
||||
|
||||
```c
|
||||
void curlx_dyn_init(struct dynbuf *s, size_t toobig);
|
||||
```
|
||||
|
||||
This initializes a struct to use for dynbuf and it cannot fail. The `toobig`
|
||||
value **must** be set to the maximum size we allow this buffer instance to
|
||||
grow to. The functions below return `CURLE_OUT_OF_MEMORY` when hitting this
|
||||
limit.
|
||||
|
||||
## `curlx_dyn_free`
|
||||
|
||||
```c
|
||||
void curlx_dyn_free(struct dynbuf *s);
|
||||
```
|
||||
|
||||
Free the associated memory and clean up. After a free, the `dynbuf` struct can
|
||||
be reused to start appending new data to.
|
||||
|
||||
## `curlx_dyn_addn`
|
||||
|
||||
```c
|
||||
CURLcode curlx_dyn_addn(struct dynbuf *s, const void *mem, size_t len);
|
||||
```
|
||||
|
||||
Append arbitrary data of a given length to the end of the buffer.
|
||||
|
||||
If this function fails it calls `curlx_dyn_free` on `dynbuf`.
|
||||
|
||||
## `curlx_dyn_add`
|
||||
|
||||
```c
|
||||
CURLcode curlx_dyn_add(struct dynbuf *s, const char *str);
|
||||
```
|
||||
|
||||
Append a C string to the end of the buffer.
|
||||
|
||||
If this function fails it calls `curlx_dyn_free` on `dynbuf`.
|
||||
|
||||
## `curlx_dyn_addf`
|
||||
|
||||
```c
|
||||
CURLcode curlx_dyn_addf(struct dynbuf *s, const char *fmt, ...);
|
||||
```
|
||||
|
||||
Append a `printf()`-style string to the end of the buffer.
|
||||
|
||||
If this function fails it calls `curlx_dyn_free` on `dynbuf`.
|
||||
|
||||
## `curlx_dyn_vaddf`
|
||||
|
||||
```c
|
||||
CURLcode curlx_dyn_vaddf(struct dynbuf *s, const char *fmt, va_list ap);
|
||||
```
|
||||
|
||||
Append a `vprintf()`-style string to the end of the buffer.
|
||||
|
||||
If this function fails it calls `curlx_dyn_free` on `dynbuf`.
|
||||
|
||||
## `curlx_dyn_reset`
|
||||
|
||||
```c
|
||||
void curlx_dyn_reset(struct dynbuf *s);
|
||||
```
|
||||
|
||||
Reset the buffer length, but leave the allocation.
|
||||
|
||||
## `curlx_dyn_tail`
|
||||
|
||||
```c
|
||||
CURLcode curlx_dyn_tail(struct dynbuf *s, size_t length);
|
||||
```
|
||||
|
||||
Keep `length` bytes of the buffer tail (the last `length` bytes of the
|
||||
buffer). The rest of the buffer is dropped. The specified `length` must not be
|
||||
larger than the buffer length. To instead keep the leading part, see
|
||||
`curlx_dyn_setlen()`.
|
||||
|
||||
## `curlx_dyn_ptr`
|
||||
|
||||
```c
|
||||
char *curlx_dyn_ptr(const struct dynbuf *s);
|
||||
```
|
||||
|
||||
Returns a `char *` to the buffer if it has a length, otherwise may return
|
||||
NULL. Since the buffer may be reallocated, this pointer should not be trusted
|
||||
or used anymore after the next buffer manipulation call.
|
||||
|
||||
## `curlx_dyn_uptr`
|
||||
|
||||
```c
|
||||
unsigned char *curlx_dyn_uptr(const struct dynbuf *s);
|
||||
```
|
||||
|
||||
Returns an `unsigned char *` to the buffer if it has a length, otherwise may
|
||||
return NULL. Since the buffer may be reallocated, this pointer should not be
|
||||
trusted or used anymore after the next buffer manipulation call.
|
||||
|
||||
## `curlx_dyn_len`
|
||||
|
||||
```c
|
||||
size_t curlx_dyn_len(const struct dynbuf *s);
|
||||
```
|
||||
|
||||
Returns the length of the buffer in bytes. Does not include the terminating
|
||||
zero byte.
|
||||
|
||||
## `curlx_dyn_setlen`
|
||||
|
||||
```c
|
||||
CURLcode curlx_dyn_setlen(struct dynbuf *s, size_t len);
|
||||
```
|
||||
|
||||
Sets the new shorter length of the buffer in number of bytes. Keeps the
|
||||
leftmost set number of bytes, discards the rest. To instead keep the tail part
|
||||
of the buffer, see `curlx_dyn_tail()`.
|
||||
|
||||
## `curlx_dyn_take`
|
||||
|
||||
```c
|
||||
char *curlx_dyn_take(struct dynbuf *s, size_t *plen);
|
||||
```
|
||||
|
||||
Transfers ownership of the internal buffer to the caller. The dynbuf
|
||||
resets to its initial state. The returned pointer may be `NULL` if the
|
||||
dynbuf never allocated memory. The returned length is the amount of
|
||||
data written to the buffer. The actual allocated memory might be larger.
|
||||
151
curl-8.15.0/docs/internals/HASH.md
Normal file
151
curl-8.15.0/docs/internals/HASH.md
Normal file
@@ -0,0 +1,151 @@
|
||||
<!--
|
||||
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
|
||||
|
||||
SPDX-License-Identifier: curl
|
||||
-->
|
||||
|
||||
# `hash`
|
||||
|
||||
#include "hash.h"
|
||||
|
||||
This is the internal module for doing hash tables. A hash table uses a hash
|
||||
function to compute an index. On each index there is a separate linked list of
|
||||
entries.
|
||||
|
||||
Create a hash table. Add items. Retrieve items. Remove items. Destroy table.
|
||||
|
||||
## `Curl_hash_init`
|
||||
|
||||
~~~c
|
||||
void Curl_hash_init(struct Curl_hash *h,
|
||||
size_t slots,
|
||||
hash_function hfunc,
|
||||
comp_function comparator,
|
||||
Curl_hash_dtor dtor);
|
||||
~~~
|
||||
|
||||
The call initializes a `struct Curl_hash`.
|
||||
|
||||
- `slots` is the number of entries to create in the hash table. Larger is
|
||||
better (faster lookups) but also uses more memory.
|
||||
- `hfunc` is a function pointer to a function that returns a `size_t` value as
|
||||
a checksum for an entry in this hash table. Ideally, it returns a unique
|
||||
value for every entry ever added to the hash table, but hash collisions are
|
||||
handled.
|
||||
- `comparator` is a function pointer to a function that compares two hash
|
||||
table entries. It should return non-zero if the compared items are
|
||||
identical.
|
||||
- `dtor` is a function pointer to a destructor called when an entry is removed
|
||||
from the table
|
||||
|
||||
## `Curl_hash_add`
|
||||
|
||||
~~~c
|
||||
void *
|
||||
Curl_hash_add(struct Curl_hash *h, void *key, size_t key_len, void *p)
|
||||
~~~
|
||||
|
||||
This call adds an entry to the hash. `key` points to the hash key and
|
||||
`key_len` is the length of the hash key. `p` is a custom pointer.
|
||||
|
||||
If there already was a match in the hash, that data is replaced with this new
|
||||
entry.
|
||||
|
||||
This function also lazily allocates the table if needed, as it is not done in
|
||||
the `Curl_hash_init` function.
|
||||
|
||||
Returns NULL on error, otherwise it returns a pointer to `p`.
|
||||
|
||||
## `Curl_hash_add2`
|
||||
|
||||
~~~c
|
||||
void *Curl_hash_add2(struct Curl_hash *h, void *key, size_t key_len, void *p,
|
||||
Curl_hash_elem_dtor dtor)
|
||||
~~~
|
||||
|
||||
This works like `Curl_hash_add` but has an extra argument: `dtor`, which is a
|
||||
destructor call for this specific entry. When this entry is removed, this
|
||||
function is called instead of the function stored for the whole hash table.
|
||||
|
||||
## `Curl_hash_delete`
|
||||
|
||||
~~~c
|
||||
int Curl_hash_delete(struct Curl_hash *h, void *key, size_t key_len);
|
||||
~~~
|
||||
|
||||
This function removes an entry from the hash table. If successful, it returns
|
||||
zero. If the entry was not found, it returns 1.
|
||||
|
||||
## `Curl_hash_pick`
|
||||
|
||||
~~~c
|
||||
void *Curl_hash_pick(struct Curl_hash *h, void *key, size_t key_len);
|
||||
~~~
|
||||
|
||||
If there is an entry in the hash that matches the given `key` with size of
|
||||
`key_len`, that its custom pointer is returned. The pointer that was called
|
||||
`p` when the entry was added.
|
||||
|
||||
It returns NULL if there is no matching entry in the hash.
|
||||
|
||||
## `Curl_hash_destroy`
|
||||
|
||||
~~~c
|
||||
void Curl_hash_destroy(struct Curl_hash *h);
|
||||
~~~
|
||||
|
||||
This function destroys a hash and cleanups up all its related data. Calling it
|
||||
multiple times is fine.
|
||||
|
||||
## `Curl_hash_clean`
|
||||
|
||||
~~~c
|
||||
void Curl_hash_clean(struct Curl_hash *h);
|
||||
~~~
|
||||
|
||||
This function removes all the entries in the given hash.
|
||||
|
||||
## `Curl_hash_clean_with_criterium`
|
||||
|
||||
~~~c
|
||||
void
|
||||
Curl_hash_clean_with_criterium(struct Curl_hash *h, void *user,
|
||||
int (*comp)(void *, void *))
|
||||
~~~
|
||||
|
||||
This function removes all the entries in the given hash that matches the
|
||||
criterion. The provided `comp` function determines if the criteria is met by
|
||||
returning non-zero.
|
||||
|
||||
## `Curl_hash_count`
|
||||
|
||||
~~~c
|
||||
size_t Curl_hash_count(struct Curl_hash *h)
|
||||
~~~
|
||||
|
||||
Returns the number of entries stored in the hash.
|
||||
|
||||
## `Curl_hash_start_iterate`
|
||||
|
||||
~~~c
|
||||
void Curl_hash_start_iterate(struct Curl_hash *hash,
|
||||
struct Curl_hash_iterator *iter):
|
||||
~~~
|
||||
|
||||
This function initializes a `struct Curl_hash_iterator` that `iter` points to.
|
||||
It can then be used to iterate over all the entries in the hash.
|
||||
|
||||
## `Curl_hash_next_element`
|
||||
|
||||
~~~c
|
||||
struct Curl_hash_element *
|
||||
Curl_hash_next_element(struct Curl_hash_iterator *iter);
|
||||
~~~
|
||||
|
||||
Given the iterator `iter`, this function returns a pointer to the next hash
|
||||
entry if there is one, or NULL if there is no more entries.
|
||||
|
||||
Called repeatedly, it iterates over all the entries in the hash table.
|
||||
|
||||
Note: it only guarantees functionality if the hash table remains untouched
|
||||
during its iteration.
|
||||
195
curl-8.15.0/docs/internals/LLIST.md
Normal file
195
curl-8.15.0/docs/internals/LLIST.md
Normal file
@@ -0,0 +1,195 @@
|
||||
<!--
|
||||
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
|
||||
|
||||
SPDX-License-Identifier: curl
|
||||
-->
|
||||
|
||||
# `llist` - linked lists
|
||||
|
||||
#include "llist.h"
|
||||
|
||||
This is the internal module for linked lists. The API is designed to be
|
||||
flexible but also to avoid dynamic memory allocation.
|
||||
|
||||
None of the involved structs should be accessed using struct fields (outside
|
||||
of `llist.c`). Use the functions.
|
||||
|
||||
## Setup and shutdown
|
||||
|
||||
`struct Curl_llist` is the struct holding a single linked list. It needs to be
|
||||
initialized with a call to `Curl_llist_init()` before it can be used
|
||||
|
||||
To clean up a list, call `Curl_llist_destroy()`. Since the linked lists
|
||||
themselves do not allocate memory, it can also be fine to just *not* clean up
|
||||
the list.
|
||||
|
||||
## Add a node
|
||||
|
||||
There are two functions for adding a node to a linked list:
|
||||
|
||||
1. Add it last in the list with `Curl_llist_append`
|
||||
2. Add it after a specific existing node with `Curl_llist_insert_next`
|
||||
|
||||
When a node is added to a list, it stores an associated custom pointer to
|
||||
anything you like and you provide a pointer to a `struct Curl_llist_node`
|
||||
struct in which it stores and updates pointers. If you intend to add the same
|
||||
struct to multiple lists concurrently, you need to have one `struct
|
||||
Curl_llist_node` for each list.
|
||||
|
||||
Add a node to a list with `Curl_llist_append(list, elem, node)`. Where
|
||||
|
||||
- `list`: points to a `struct Curl_llist`
|
||||
- `elem`: points to what you want added to the list
|
||||
- `node`: is a pointer to a `struct Curl_llist_node`. Data storage for this
|
||||
node.
|
||||
|
||||
Example: to add a `struct foobar` to a linked list. Add a node struct within
|
||||
it:
|
||||
|
||||
struct foobar {
|
||||
char *random;
|
||||
struct Curl_llist_node storage; /* can be anywhere in the struct */
|
||||
char *data;
|
||||
};
|
||||
|
||||
struct Curl_llist barlist; /* the list for foobar entries */
|
||||
struct foobar entries[10];
|
||||
|
||||
Curl_llist_init(&barlist, NULL);
|
||||
|
||||
/* add the first struct to the list */
|
||||
Curl_llist_append(&barlist, &entries[0], &entries[0].storage);
|
||||
|
||||
See also `Curl_llist_insert_next`.
|
||||
|
||||
## Remove a node
|
||||
|
||||
Remove a node again from a list by calling `Curl_llist_remove()`. This
|
||||
destroys the node's `elem` (e.g. calling a registered free function).
|
||||
|
||||
To remove a node without destroying its `elem`, use `Curl_node_take_elem()`
|
||||
which returns the `elem` pointer and removes the node from the list. The
|
||||
caller then owns this pointer and has to take care of it.
|
||||
|
||||
## Iterate
|
||||
|
||||
To iterate over a list: first get the head entry and then iterate over the
|
||||
nodes as long there is a next. Each node has an *element* associated with it,
|
||||
the custom pointer you stored there. Usually a struct pointer or similar.
|
||||
|
||||
struct Curl_llist_node *iter;
|
||||
|
||||
/* get the first entry of the 'barlist' */
|
||||
iter = Curl_llist_head(&barlist);
|
||||
|
||||
while(iter) {
|
||||
/* extract the element pointer from the node */
|
||||
struct foobar *elem = Curl_node_elem(iter);
|
||||
|
||||
/* advance to the next node in the list */
|
||||
iter = Curl_node_next(iter);
|
||||
}
|
||||
|
||||
# Function overview
|
||||
|
||||
## `Curl_llist_init`
|
||||
|
||||
~~~c
|
||||
void Curl_llist_init(struct Curl_llist *list, Curl_llist_dtor dtor);
|
||||
~~~
|
||||
|
||||
Initializes the `list`. The argument `dtor` is NULL or a function pointer that
|
||||
gets called when list nodes are removed from this list.
|
||||
|
||||
The function is infallible.
|
||||
|
||||
~~~c
|
||||
typedef void (*Curl_llist_dtor)(void *user, void *elem);
|
||||
~~~
|
||||
|
||||
`dtor` is called with two arguments: `user` and `elem`. The first being the
|
||||
`user` pointer passed in to `Curl_llist_remove()`or `Curl_llist_destroy()` and
|
||||
the second is the `elem` pointer associated with removed node. The pointer
|
||||
that `Curl_node_elem()` would have returned for that node.
|
||||
|
||||
## `Curl_llist_destroy`
|
||||
|
||||
~~~c
|
||||
void Curl_llist_destroy(struct Curl_llist *list, void *user);
|
||||
~~~
|
||||
|
||||
This removes all nodes from the `list`. This leaves the list in a cleared
|
||||
state.
|
||||
|
||||
The function is infallible.
|
||||
|
||||
## `Curl_llist_append`
|
||||
|
||||
~~~c
|
||||
void Curl_llist_append(struct Curl_llist *list,
|
||||
const void *elem, struct Curl_llist_node *node);
|
||||
~~~
|
||||
|
||||
Adds `node` last in the `list` with a custom pointer to `elem`.
|
||||
|
||||
The function is infallible.
|
||||
|
||||
## `Curl_llist_insert_next`
|
||||
|
||||
~~~c
|
||||
void Curl_llist_insert_next(struct Curl_llist *list,
|
||||
struct Curl_llist_node *node,
|
||||
const void *elem,
|
||||
struct Curl_llist_node *node);
|
||||
~~~
|
||||
|
||||
Adds `node` to the `list` with a custom pointer to `elem` immediately after
|
||||
the previous list `node`.
|
||||
|
||||
The function is infallible.
|
||||
|
||||
## `Curl_llist_head`
|
||||
|
||||
~~~c
|
||||
struct Curl_llist_node *Curl_llist_head(struct Curl_llist *list);
|
||||
~~~
|
||||
|
||||
Returns a pointer to the first node of the `list`, or a NULL if empty.
|
||||
|
||||
## `Curl_node_uremove`
|
||||
|
||||
~~~c
|
||||
void Curl_node_uremove(struct Curl_llist_node *node, void *user);
|
||||
~~~
|
||||
|
||||
Removes the `node` the list it was previously added to. Passes the `user`
|
||||
pointer to the list's destructor function if one was setup.
|
||||
|
||||
The function is infallible.
|
||||
|
||||
## `Curl_node_remove`
|
||||
|
||||
~~~c
|
||||
void Curl_node_remove(struct Curl_llist_node *node);
|
||||
~~~
|
||||
|
||||
Removes the `node` the list it was previously added to. Passes a NULL pointer
|
||||
to the list's destructor function if one was setup.
|
||||
|
||||
The function is infallible.
|
||||
|
||||
## `Curl_node_elem`
|
||||
|
||||
~~~c
|
||||
void *Curl_node_elem(struct Curl_llist_node *node);
|
||||
~~~
|
||||
|
||||
Given a list node, this function returns the associated element.
|
||||
|
||||
## `Curl_node_next`
|
||||
|
||||
~~~c
|
||||
struct Curl_llist_node *Curl_node_next(struct Curl_llist_node *node);
|
||||
~~~
|
||||
|
||||
Given a list node, this function returns the next node in the list.
|
||||
72
curl-8.15.0/docs/internals/MID.md
Normal file
72
curl-8.15.0/docs/internals/MID.md
Normal file
@@ -0,0 +1,72 @@
|
||||
<!--
|
||||
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
|
||||
|
||||
SPDX-License-Identifier: curl
|
||||
-->
|
||||
|
||||
# Multi Identifiers (mid)
|
||||
|
||||
All transfers (easy handles) added to a multi handle are assigned
|
||||
a unique identifier until they are removed again. The multi handle
|
||||
keeps a table `multi->xfers` that allow O(1) access to the easy
|
||||
handle by its `mid`.
|
||||
|
||||
References to other easy handles *should* keep their `mid`s instead
|
||||
of a pointer (not all code has been converted as of now). This solves
|
||||
problems in easy and multi handle life cycle management as well as
|
||||
iterating over handles where operations may add/remove other handles.
|
||||
|
||||
### Values and Lifetime
|
||||
|
||||
An `mid` is an `unsigned int`. There are two reserved values:
|
||||
|
||||
* `0`: is the `mid` of an internal "admin" handle. Multi and share handles
|
||||
each have their own admin handle for maintenance operations, like
|
||||
shutting down connections.
|
||||
* `UINT_MAX`: the "invalid" `mid`. Easy handles are initialized with
|
||||
this value. They get it assigned again when removed from
|
||||
a multi handle.
|
||||
|
||||
This makes potential range of `mid`s from `1` to `UINT_MAX - 1` *inside
|
||||
the same multi handle at the same time*. However, the `multi->xfers` table
|
||||
reuses `mid` values from previous transfers that have been removed.
|
||||
|
||||
`multi->xfers` is created with an initial capacity. At the time of this
|
||||
writing that is `16` for "multi_easy" handles (used in `curl_easy_perform()`
|
||||
and `512` for multi handles created with `curl_multi_init()`.
|
||||
|
||||
The first added easy handle gets `mid == 1` assigned. The second one receives `2`,
|
||||
even when the fist one has been removed already. Every added handle gets an
|
||||
`mid` one larger than the previously assigned one. Until the capacity of
|
||||
the table is reached and it starts looking for a free id at `1` again (`0`
|
||||
is always in the table).
|
||||
|
||||
When adding a new handle, the multi checks the amount of free entries
|
||||
in the `multi->xfers` table. If that drops below a threshold (currently 25%),
|
||||
the table is resized. This serves two purposes: one, a previous `mid` is not
|
||||
reused immediately and second, table resizes are not needed that often.
|
||||
|
||||
The table is implemented in `uint-table.[ch]`. More details in [`UINT_SETS`](UINT_SETS.md).
|
||||
|
||||
### Tracking `mid`s
|
||||
|
||||
There are several places where transfers need to be tracked:
|
||||
|
||||
* the multi tracks `process`, `pending` and `msgsent` transfers. A transfer
|
||||
is in at most one of these at a time.
|
||||
* connections track the transfers that are *attached* to them.
|
||||
* multi event handling tracks transfers interested in a specific socket.
|
||||
* DoH handles track the handle they perform lookups for (and vice versa).
|
||||
|
||||
There are two bitset implemented for storing `mid`s: `uint_bset` and `uint_spbset`.
|
||||
The first is a bitset optimal for storing a large number of unsigned int values.
|
||||
The second one is a "sparse" variant good for storing a small set of numbers.
|
||||
More details about these in [`UINT_SETS`](UINT_SETS.md).
|
||||
|
||||
A multi uses `uint_bset`s for `process`, `pending` and `msgsent`. Connections
|
||||
and sockets use the sparse variant as both often track only a single transfer
|
||||
and at most 100 on an HTTP/2 or HTTP/3 connection/socket.
|
||||
|
||||
These sets allow safe iteration while being modified. This allows a multi
|
||||
to iterate over its "process" set while existing transfers are removed
|
||||
or new ones added.
|
||||
57
curl-8.15.0/docs/internals/MQTT.md
Normal file
57
curl-8.15.0/docs/internals/MQTT.md
Normal file
@@ -0,0 +1,57 @@
|
||||
<!--
|
||||
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
|
||||
|
||||
SPDX-License-Identifier: curl
|
||||
-->
|
||||
|
||||
# MQTT in curl
|
||||
|
||||
## Usage
|
||||
|
||||
A plain "GET" subscribes to the topic and prints all published messages.
|
||||
|
||||
Doing a "POST" publishes the post data to the topic and exits.
|
||||
|
||||
|
||||
### Subscribing
|
||||
|
||||
Command usage:
|
||||
|
||||
curl mqtt://host/topic
|
||||
|
||||
Example subscribe:
|
||||
|
||||
curl mqtt://host.home/bedroom/temp
|
||||
|
||||
This sends an MQTT SUBSCRIBE packet for the topic `bedroom/temp` and listen in
|
||||
for incoming PUBLISH packets.
|
||||
|
||||
You can set the upkeep interval ms option to make curl send MQTT ping requests to the
|
||||
server at an internal, to prevent the connection to get closed because of idleness.
|
||||
You might then need to use the progress callback to cancel the operation.
|
||||
|
||||
### Publishing
|
||||
|
||||
Command usage:
|
||||
|
||||
curl -d payload mqtt://host/topic
|
||||
|
||||
Example publish:
|
||||
|
||||
curl -d 75 mqtt://host.home/bedroom/dimmer
|
||||
|
||||
This sends an MQTT PUBLISH packet to the topic `bedroom/dimmer` with the
|
||||
payload `75`.
|
||||
|
||||
## What does curl deliver as a response to a subscribe
|
||||
|
||||
Whenever a PUBLISH packet is received, curl outputs two bytes topic length (MSB | LSB), the topic followed by the
|
||||
payload.
|
||||
|
||||
## Caveats
|
||||
|
||||
Remaining limitations:
|
||||
- Only QoS level 0 is implemented for publish
|
||||
- No way to set retain flag for publish
|
||||
- No TLS (mqtts) support
|
||||
- Naive EAGAIN handling does not handle split messages
|
||||
127
curl-8.15.0/docs/internals/MULTI-EV.md
Normal file
127
curl-8.15.0/docs/internals/MULTI-EV.md
Normal file
@@ -0,0 +1,127 @@
|
||||
<!--
|
||||
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
|
||||
|
||||
SPDX-License-Identifier: curl
|
||||
-->
|
||||
|
||||
# Multi Event Based
|
||||
|
||||
A libcurl multi is operating "event based" when the application uses
|
||||
and event library like `libuv` to monitor the sockets and file descriptors
|
||||
libcurl uses to trigger transfer operations. How that works from the
|
||||
applications point of view is described in libcurl-multi(3).
|
||||
|
||||
This documents is about the internal handling.
|
||||
|
||||
## Source Locations
|
||||
|
||||
All code related to event based handling is found in `lib/multi_ev.c`
|
||||
and `lib/multi_ev.h`. The header defines a set of internal functions
|
||||
and `struct curl_multi_ev` that is embedded in each multi handle.
|
||||
|
||||
There is `Curl_multi_ev_init()` and `Curl_multi_ev_cleanup()` to manage
|
||||
the overall life cycle, call on creation and destruction of the multi
|
||||
handle.
|
||||
|
||||
## Tracking Events
|
||||
|
||||
First, the various functions in `lib/multi_ev.h` only ever really do
|
||||
something when the libcurl application has registered its callback
|
||||
in `multi->socket_cb`.
|
||||
|
||||
This is important as this callback gets informed about *changes* to sockets.
|
||||
When a new socket is added, an existing is removed, or the `POLLIN/OUT`
|
||||
flags change, `multi->socket_cb` needs to be invoked. `multi_ev` has to
|
||||
track what it already reported to detect changes.
|
||||
|
||||
Most applications are expected to go "event based" right from the start,
|
||||
but the libcurl API does not prohibit an application to start another
|
||||
way and then go for events later on, even in the middle of a transfer.
|
||||
|
||||
### Transfer Events
|
||||
|
||||
Most event that happen are in connection with a transfer. A transfer
|
||||
opens a connection, which opens a socket, and waits for this socket
|
||||
to become writable (`POLLOUT`) when using TCP, for example.
|
||||
|
||||
The multi then calls `Curl_multi_ev_assess_xfer(multi, data)` to
|
||||
let the multi event code detect what sockets the transfer is interested in.
|
||||
If indeed a `multi->socket_cb` is set, the *current* transfer pollset is
|
||||
retrieved via `Curl_multi_getsock()`. This current pollset is then
|
||||
compared to the *previous* pollset. If relevant changes are detected,
|
||||
`multi->socket_cb` gets informed about those. These can be:
|
||||
|
||||
* a socket is in the current set, but not the previous one
|
||||
* a socket was also in the previous one, but IN/OUT flags changed
|
||||
* a socket in the previous one is no longer part of the current
|
||||
|
||||
`multi_ev.c` keeps a `struct mev_sh_entry` for each sockets in a hash
|
||||
with the socket as key. It tracks in each entry which transfers are
|
||||
interested in this particular socket. How many transfer want to read
|
||||
and/or write and what the summarized `POLLIN/POLLOUT` action, that
|
||||
had been reported to `multi->socket_cb` was.
|
||||
|
||||
This is necessary as a socket may be in use by several transfers
|
||||
at the same time (think HTTP/2 on the same connection). When a transfer
|
||||
is done and gets removed from the socket entry, it decrements
|
||||
the reader and/or writer count (depending on what it was last
|
||||
interested in). This *may* result in the entry's summarized action
|
||||
to change, or not.
|
||||
|
||||
### Connection Events
|
||||
|
||||
There are also events not connected to any transfer that need to be tracked.
|
||||
The multi connection cache, concerned with clean shutdowns of connections,
|
||||
is interested in socket events during the shutdown.
|
||||
|
||||
To allow use of the libcurl infrastructure, the connection cache operates
|
||||
using an *internal* easy handle that is not a transfer as such. The
|
||||
internal handle is used for all connection shutdown operations, being tied
|
||||
to a particular connection only for a short time. This means tracking
|
||||
the last pollset for an internal handle is useless.
|
||||
|
||||
Instead, the connection cache uses `Curl_multi_ev_assess_conn()` to have
|
||||
multi event handling check the connection and track a "last pollset"
|
||||
for the connection alone.
|
||||
|
||||
## Event Processing
|
||||
|
||||
When the libcurl application is informed by the event library that
|
||||
a particular socket has an event, it calls `curl_multi_socket_action()`
|
||||
to make libcurl react to it. This internally invokes
|
||||
`Curl_multi_ev_expire_xfers()` which expires all transfers that
|
||||
are interested in the given socket, so the multi handle runs them.
|
||||
|
||||
In addition `Curl_multi_ev_expire_xfers()` returns a `bool` to let
|
||||
the multi know that connections are also interested in the socket, so
|
||||
the connection pool should be informed as well.
|
||||
|
||||
## All Things Pass
|
||||
|
||||
When a transfer is done, e.g. removed from its multi handle, the
|
||||
multi calls `Curl_multi_ev_xfer_done()`. This cleans up the pollset
|
||||
tracking for the transfer.
|
||||
|
||||
When a connection is done, and before it is destroyed,
|
||||
`Curl_multi_ev_conn_done()` is called. This cleans up the pollset
|
||||
tracking for this connection.
|
||||
|
||||
When a socket is about to be closed, `Curl_multi_ev_socket_done()`
|
||||
is called to cleanup the socket entry and all information kept there.
|
||||
|
||||
These calls do not have to happen in any particular order. A transfer's
|
||||
socket may be around while the transfer is ongoing. Or it might disappear
|
||||
in the middle of things. Also, a transfer might be interested in several
|
||||
sockets at the same time (resolving, eye balling, ftp are all examples of
|
||||
those).
|
||||
|
||||
### And Come Again
|
||||
|
||||
While transfer and connection identifier are practically unique in a
|
||||
libcurl application, sockets are not. Operating systems are keen on reusing
|
||||
their resources, and the next socket may get the same identifier as
|
||||
one just having been closed with high likelihood.
|
||||
|
||||
This means that multi event handling needs to be informed *before* a close,
|
||||
clean up all its tracking and be ready to see that same socket identifier
|
||||
again right after.
|
||||
116
curl-8.15.0/docs/internals/NEW-PROTOCOL.md
Normal file
116
curl-8.15.0/docs/internals/NEW-PROTOCOL.md
Normal file
@@ -0,0 +1,116 @@
|
||||
<!--
|
||||
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
|
||||
|
||||
SPDX-License-Identifier: curl
|
||||
-->
|
||||
|
||||
# Adding a new protocol?
|
||||
|
||||
Every once in a while, someone comes up with the idea of adding support for yet
|
||||
another protocol to curl. After all, curl already supports 25 something
|
||||
protocols and it is the Internet transfer machine for the world.
|
||||
|
||||
In the curl project we love protocols and we love supporting many protocols
|
||||
and doing it well.
|
||||
|
||||
How do you proceed to add a new protocol and what are the requirements?
|
||||
|
||||
## No fixed set of requirements
|
||||
|
||||
This document is an attempt to describe things to consider. There is no
|
||||
checklist of the twenty-seven things you need to cross off. We view the entire
|
||||
effort as a whole and then judge if it seems to be the right thing - for now.
|
||||
The more things that look right, fit our patterns and are done in ways that
|
||||
align with our thinking, the better are the chances that we agree that
|
||||
supporting this protocol is a grand idea.
|
||||
|
||||
## Mutual benefit is preferred
|
||||
|
||||
curl is not here for your protocol. Your protocol is not here for curl. The
|
||||
best cooperation and end result occur when all involved parties mutually see
|
||||
and agree that supporting this protocol in curl would be good for everyone.
|
||||
Heck, for the world.
|
||||
|
||||
Consider "selling us" the idea that we need an implementation merged in curl,
|
||||
to be fairly important. *Why* do we want curl to support this new protocol?
|
||||
|
||||
## Protocol requirements
|
||||
|
||||
### Client-side
|
||||
|
||||
The protocol implementation is for a client's side of a "communication
|
||||
session".
|
||||
|
||||
### Transfer oriented
|
||||
|
||||
The protocol itself should be focused on *transfers*. Be it uploads or
|
||||
downloads or both. It should at least be possible to view the transfers as
|
||||
such, like we can view reading emails over POP3 as a download and sending
|
||||
emails over SMTP as an upload.
|
||||
|
||||
If you cannot even shoehorn the protocol into a transfer focused view, then
|
||||
you are up for a tough argument.
|
||||
|
||||
### URL
|
||||
|
||||
There should be a documented URL format. If there is an RFC for it there is no
|
||||
question about it but the syntax does not have to be a published RFC. It could
|
||||
be enough if it is already in use by other implementations.
|
||||
|
||||
If you make up the syntax just in order to be able to propose it to curl, then
|
||||
you are in a bad place. URLs are designed and defined for interoperability.
|
||||
There should at least be a good chance that other clients and servers can be
|
||||
implemented supporting the same URL syntax and work the same or similar way.
|
||||
|
||||
URLs work on registered 'schemes'. There is a register of [all officially
|
||||
recognized
|
||||
schemes](https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml). If
|
||||
your protocol is not in there, is it really a protocol we want?
|
||||
|
||||
### Wide and public use
|
||||
|
||||
The protocol shall already be used or have an expectation of getting used
|
||||
widely. Experimental protocols are better off worked on in experiments first,
|
||||
to prove themselves before they are adopted by curl.
|
||||
|
||||
## Code
|
||||
|
||||
Of course the code needs to be written, provided, licensed agreeably and it
|
||||
should follow our code guidelines and review comments have to be dealt with.
|
||||
If the implementation needs third party code, that third party code should not
|
||||
have noticeably lesser standards than the curl project itself.
|
||||
|
||||
## Tests
|
||||
|
||||
As much of the protocol implementation as possible needs to be verified by
|
||||
curl test cases. We must have the implementation get tested by CI jobs,
|
||||
torture tests and more.
|
||||
|
||||
We have experienced many times in the past how new implementations were brought
|
||||
to curl and immediately once the code had been merged, the originator vanished
|
||||
from the face of the earth. That is fine, but we need to take the necessary
|
||||
precautions so when it happens we are still fine.
|
||||
|
||||
Our test infrastructure is powerful enough to test just about every possible
|
||||
protocol - but it might require a bit of an effort to make it happen.
|
||||
|
||||
## Documentation
|
||||
|
||||
We cannot assume that users are particularly familiar with details and
|
||||
peculiarities of the protocol. It needs documentation.
|
||||
|
||||
Maybe it even needs some internal documentation so that the developers who try
|
||||
to debug something five years from now can figure out functionality a little
|
||||
easier.
|
||||
|
||||
The protocol specification itself should be freely available without requiring
|
||||
a non-disclosure agreement or similar.
|
||||
|
||||
## Do not compare
|
||||
|
||||
We are constantly raising the bar and we are constantly improving the project.
|
||||
A lot of things we did in the past would not be acceptable if done today.
|
||||
Therefore, you might be tempted to use shortcuts or "hacks" you can spot
|
||||
other - existing - protocol implementations have used, but there is nothing to
|
||||
gain from that. The bar has been raised. Former "cheats" may not tolerated
|
||||
anymore.
|
||||
47
curl-8.15.0/docs/internals/PORTING.md
Normal file
47
curl-8.15.0/docs/internals/PORTING.md
Normal file
@@ -0,0 +1,47 @@
|
||||
<!--
|
||||
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
|
||||
|
||||
SPDX-License-Identifier: curl
|
||||
-->
|
||||
|
||||
# porting libcurl
|
||||
|
||||
The basic approach I use when porting libcurl to another OS when the existing
|
||||
configure or cmake build setups are not suitable.
|
||||
|
||||
## Build
|
||||
|
||||
Write a build script/Makefile that builds *all* C files under lib/. If
|
||||
possible, use the `lib/Makefile.inc` that lists all files in Makefile
|
||||
variables.
|
||||
|
||||
In the Makefile, make sure you define what OS you build for: `-D[OPERATING
|
||||
SYSTEM]`, or similar. Perhaps the compiler in use already define a standard
|
||||
one? Then you might not need to define your own.
|
||||
|
||||
## Add the new OS
|
||||
|
||||
In the `lib/curl_config.h` header file, in the section for when `HAVE_CONFIG_H`
|
||||
is *not* defined (starting at around line 150), add a new conditional include
|
||||
in this style:
|
||||
|
||||
~~~c
|
||||
#ifdef [OPERATING SYSTEM]
|
||||
# include "config-operatingsystem.h"
|
||||
#endif
|
||||
~~~
|
||||
|
||||
Create `lib/config-operatingsystem.h`. You might want to start with copying a
|
||||
another config-* file and then start trimming according to what your
|
||||
environment supports.
|
||||
|
||||
## Build it
|
||||
|
||||
When you run into compiler warnings or errors, the
|
||||
`lib/config-operatingsystem.h` file should be where you should focus your work
|
||||
and edits.
|
||||
|
||||
A recommended approach is to define a lot of the `CURL_DISABLE_*` defines (see
|
||||
the [CURL-DISABLE](../CURL-DISABLE.md) document) initially to help narrow down
|
||||
the initial work as that can save you from having to give attention to areas of
|
||||
the code that you do not care for in your port.
|
||||
12
curl-8.15.0/docs/internals/README.md
Normal file
12
curl-8.15.0/docs/internals/README.md
Normal file
@@ -0,0 +1,12 @@
|
||||
<!--
|
||||
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
|
||||
|
||||
SPDX-License-Identifier: curl
|
||||
-->
|
||||
|
||||
# Internals
|
||||
|
||||
This directory contains documentation covering libcurl internals; APIs and
|
||||
concepts that are useful for contributors and maintainers.
|
||||
|
||||
Public APIs are documented in the public documentation, not here.
|
||||
71
curl-8.15.0/docs/internals/SCORECARD.md
Normal file
71
curl-8.15.0/docs/internals/SCORECARD.md
Normal file
@@ -0,0 +1,71 @@
|
||||
<!--
|
||||
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
|
||||
|
||||
SPDX-License-Identifier: curl
|
||||
-->
|
||||
|
||||
# scorecard.py
|
||||
|
||||
This is an internal script in `tests/http/scorecard.py` used for testing
|
||||
curl's performance in a set of cases. These are for exercising parts of
|
||||
curl/libcurl in a reproducible fashion to judge improvements or detect
|
||||
regressions. They are not intended to represent real world scenarios
|
||||
as such.
|
||||
|
||||
This script is not part of any official interface and we may
|
||||
change it in the future according to the project's needs.
|
||||
|
||||
## setup
|
||||
|
||||
When you are able to run curl's `pytest` suite, scorecard should work
|
||||
for you as well. They start a local Apache httpd or Caddy server and
|
||||
invoke the locally build `src/curl` (by default).
|
||||
|
||||
## invocation
|
||||
|
||||
A typical invocation for measuring performance of HTTP/2 downloads would be:
|
||||
|
||||
```
|
||||
curl> python3 tests/http/scorecard.py -d h2
|
||||
```
|
||||
|
||||
and this prints a table with the results. The last argument is the protocol to test and
|
||||
it can be `h1`, `h2` or `h3`. You can add `--json` to get results in JSON instead of text.
|
||||
|
||||
Help for all command line options are available via:
|
||||
|
||||
```
|
||||
curl> python3 tests/http/scorecard.py -h
|
||||
```
|
||||
|
||||
## scenarios
|
||||
|
||||
Apart from `-d/--downloads` there is `-u/--uploads` and `-r/--requests`. These are run with
|
||||
a variation of resource sizes and parallelism by default. You can specify these in some way
|
||||
if you are just interested in a particular case.
|
||||
|
||||
For example, to run downloads of a 1 MB resource only, 100 times with at max 6 parallel transfers, use:
|
||||
|
||||
```
|
||||
curl> python3 tests/http/scorecard.py -d --download-sizes=1mb --download-count=100 --download-parallel=6 h2
|
||||
```
|
||||
|
||||
Similar options are available for uploads and requests scenarios.
|
||||
|
||||
## dtrace
|
||||
|
||||
With the `--dtrace` option, scorecard produces a dtrace sample of the user stacks in `tests/http/gen/curl/curl.user_stacks`. On many platforms, `dtrace` requires **special permissions**. It is therefore invoked via `sudo` and you should make sure that sudo works for the run without prompting for a password.
|
||||
|
||||
Note: the file is the trace of the last curl invocation by scorecard. Use the parameters to narrow down the runs to the particular case you are interested in.
|
||||
|
||||
## flame graphs
|
||||
|
||||
With the excellent [Flame Graph](https://github.com/brendangregg/FlameGraph) by Brendan Gregg, scorecard can turn the `dtrace` samples into an interactive SVG. Set the environment variable `FLAMEGRAPH` to the location of your clone of that project and invoked scorecard with the `--flame` option. Like
|
||||
|
||||
```
|
||||
curl> FLAMEGRAPH=/Users/sei/projects/FlameGraph python3 tests/http/scorecard.py \
|
||||
-r --request-count=50000 --request-parallels=100 --samples=1 --flame h2
|
||||
```
|
||||
and the SVG of the run is in `tests/http/gen/curl/curl.flamegraph.svg`. You can open that in Firefox and zoom in/out of stacks of interest.
|
||||
|
||||
Note: as with `dtrace`, the flame graph is for the last invocation of curl done by scorecard.
|
||||
111
curl-8.15.0/docs/internals/SPLAY.md
Normal file
111
curl-8.15.0/docs/internals/SPLAY.md
Normal file
@@ -0,0 +1,111 @@
|
||||
<!--
|
||||
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
|
||||
|
||||
SPDX-License-Identifier: curl
|
||||
-->
|
||||
|
||||
# `splay`
|
||||
|
||||
#include "splay.h"
|
||||
|
||||
This is an internal module for splay tree management. A splay tree is a binary
|
||||
search tree with the additional property that recently accessed elements are
|
||||
quick to access again. A self-balancing tree.
|
||||
|
||||
Nodes are added to the tree, they are accessed and removed from the tree and
|
||||
it automatically rebalances itself in each operation.
|
||||
|
||||
## libcurl use
|
||||
|
||||
libcurl adds fixed timeout expiry timestamps to the splay tree, and is meant
|
||||
to scale up to holding a huge amount of pending timeouts with decent
|
||||
performance.
|
||||
|
||||
The splay tree is used to:
|
||||
|
||||
1. figure out the next timeout expiry value closest in time
|
||||
2. iterate over timeouts that already have expired
|
||||
|
||||
This splay tree rebalances itself based on the time value.
|
||||
|
||||
Each node in the splay tree points to a `struct Curl_easy`. Each `Curl_easy`
|
||||
struct is represented only once in the tree. To still allow each easy handle
|
||||
to have a large number of timeouts per handle, each handle has a sorted linked
|
||||
list of pending timeouts. Only the handle's timeout that is closest to expire
|
||||
is the timestamp used for the splay tree node.
|
||||
|
||||
When a specific easy handle's timeout expires, the node gets removed from the
|
||||
splay tree and from the handle's linked list of timeouts. The next timeout for
|
||||
that handle is then first in line and becomes the new timeout value as the
|
||||
node is re-added to the splay.
|
||||
|
||||
## `Curl_splay`
|
||||
|
||||
~~~c
|
||||
struct Curl_tree *Curl_splay(struct curltime i, struct Curl_tree *t);
|
||||
~~~
|
||||
|
||||
Rearranges the tree `t` after the provide time `i`.
|
||||
|
||||
## `Curl_splayinsert`
|
||||
|
||||
~~~c
|
||||
struct Curl_tree *Curl_splayinsert(struct curltime key,
|
||||
struct Curl_tree *t,
|
||||
struct Curl_tree *node);
|
||||
~~~
|
||||
|
||||
This function inserts a new `node` in the tree, using the given `key`
|
||||
timestamp. The `node` struct has a field called `->payload` that can be set to
|
||||
point to anything. libcurl sets this to the `struct Curl_easy` handle that is
|
||||
associated with the timeout value set in `key`.
|
||||
|
||||
The splay insert function does not allocate any memory, it assumes the caller
|
||||
has that arranged.
|
||||
|
||||
It returns a pointer to the new tree root.
|
||||
|
||||
## `Curl_splaygetbest`
|
||||
|
||||
~~~c
|
||||
struct Curl_tree *Curl_splaygetbest(struct curltime key,
|
||||
struct Curl_tree *tree,
|
||||
struct Curl_tree **removed);
|
||||
~~~
|
||||
|
||||
If there is a node in the `tree` that has a time value that is less than the
|
||||
provided `key`, this function removes that node from the tree and provides it
|
||||
in the `*removed` pointer (or NULL if there was no match).
|
||||
|
||||
It returns a pointer to the new tree root.
|
||||
|
||||
## `Curl_splayremove`
|
||||
|
||||
~~~c
|
||||
int Curl_splayremove(struct Curl_tree *tree,
|
||||
struct Curl_tree *node,
|
||||
struct Curl_tree **newroot);
|
||||
~~~
|
||||
|
||||
Removes a given `node` from a splay `tree`, and returns the `newroot`
|
||||
identifying the new tree root.
|
||||
|
||||
Note that a clean tree without any nodes present implies a NULL pointer.
|
||||
|
||||
## `Curl_splayset`
|
||||
|
||||
~~~c
|
||||
void Curl_splayset(struct Curl_tree *node, void *payload);
|
||||
~~~
|
||||
|
||||
Set a custom pointer to be stored in the splay node. This pointer is not used
|
||||
by the splay code itself and can be retrieved again with `Curl_splayget`.
|
||||
|
||||
## `Curl_splayget`
|
||||
|
||||
~~~c
|
||||
void *Curl_splayget(struct Curl_tree *node);
|
||||
~~~
|
||||
|
||||
Get the custom pointer from the splay node that was previously set with
|
||||
`Curl_splayset`. If no pointer was set before, it returns NULL.
|
||||
230
curl-8.15.0/docs/internals/STRPARSE.md
Normal file
230
curl-8.15.0/docs/internals/STRPARSE.md
Normal file
@@ -0,0 +1,230 @@
|
||||
<!--
|
||||
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
|
||||
|
||||
SPDX-License-Identifier: curl
|
||||
-->
|
||||
|
||||
# String parsing with `strparse`
|
||||
|
||||
The functions take input via a pointer to a pointer, which allows the
|
||||
functions to advance the pointer on success which then by extension allows
|
||||
"chaining" of functions like this example that gets a word, a space and then a
|
||||
second word:
|
||||
|
||||
~~~c
|
||||
if(curlx_str_word(&line, &word1, MAX) ||
|
||||
curlx_str_singlespace(&line) ||
|
||||
curlx_str_word(&line, &word2, MAX))
|
||||
fprintf(stderr, "ERROR\n");
|
||||
~~~
|
||||
|
||||
The input pointer **must** point to a null-terminated buffer area or these
|
||||
functions risk continuing "off the edge".
|
||||
|
||||
## Strings
|
||||
|
||||
The functions that return string information does so by populating a
|
||||
`struct Curl_str`:
|
||||
|
||||
~~~c
|
||||
struct Curl_str {
|
||||
char *str;
|
||||
size_t len;
|
||||
};
|
||||
~~~
|
||||
|
||||
Access the struct fields with `curlx_str()` for the pointer and `curlx_strlen()`
|
||||
for the length rather than using the struct fields directly.
|
||||
|
||||
## `curlx_str_init`
|
||||
|
||||
~~~c
|
||||
void curlx_str_init(struct Curl_str *out)
|
||||
~~~
|
||||
|
||||
This initiates a string struct. The parser functions that store info in
|
||||
strings always init the string themselves, so this stand-alone use is often
|
||||
not necessary.
|
||||
|
||||
## `curlx_str_assign`
|
||||
|
||||
~~~c
|
||||
void curlx_str_assign(struct Curl_str *out, const char *str, size_t len)
|
||||
~~~
|
||||
|
||||
Set a pointer and associated length in the string struct.
|
||||
|
||||
## `curlx_str_word`
|
||||
|
||||
~~~c
|
||||
int curlx_str_word(char **linep, struct Curl_str *out, const size_t max);
|
||||
~~~
|
||||
|
||||
Get a sequence of bytes until the first space or the end of the string. Return
|
||||
non-zero on error. There is no way to include a space in the word, no sort of
|
||||
escaping. The word must be at least one byte, otherwise it is considered an
|
||||
error.
|
||||
|
||||
`max` is the longest accepted word, or it returns error.
|
||||
|
||||
On a successful return, `linep` is updated to point to the byte immediately
|
||||
following the parsed word.
|
||||
|
||||
## `curlx_str_until`
|
||||
|
||||
~~~c
|
||||
int curlx_str_until(char **linep, struct Curl_str *out, const size_t max,
|
||||
char delim);
|
||||
~~~
|
||||
|
||||
Like `curlx_str_word` but instead of parsing to space, it parses to a given
|
||||
custom delimiter non-zero byte `delim`.
|
||||
|
||||
`max` is the longest accepted word, or it returns error.
|
||||
|
||||
The parsed word must be at least one byte, otherwise it is considered an
|
||||
error.
|
||||
|
||||
## `curlx_str_untilnl`
|
||||
|
||||
~~~c
|
||||
int curlx_str_untilnl(char **linep, struct Curl_str *out, const size_t max);
|
||||
~~~
|
||||
|
||||
Like `curlx_str_untilnl` but instead parses until it finds a "newline byte".
|
||||
That means either a CR (ASCII 13) or an LF (ASCII 10) octet.
|
||||
|
||||
`max` is the longest accepted word, or it returns error.
|
||||
|
||||
The parsed word must be at least one byte, otherwise it is considered an
|
||||
error.
|
||||
|
||||
## `curlx_str_cspn`
|
||||
|
||||
~~~c
|
||||
int curlx_str_cspn(const char **linep, struct Curl_str *out, const char *cspn);
|
||||
~~~
|
||||
|
||||
Get a sequence of characters until one of the bytes in the `cspn` string
|
||||
matches. Similar to the `strcspn` function.
|
||||
|
||||
## `curlx_str_quotedword`
|
||||
|
||||
~~~c
|
||||
int curlx_str_quotedword(char **linep, struct Curl_str *out, const size_t max);
|
||||
~~~
|
||||
|
||||
Get a "quoted" word. This means everything that is provided within a leading
|
||||
and an ending double quote character. No escaping possible.
|
||||
|
||||
`max` is the longest accepted word, or it returns error.
|
||||
|
||||
The parsed word must be at least one byte, otherwise it is considered an
|
||||
error.
|
||||
|
||||
## `curlx_str_single`
|
||||
|
||||
~~~c
|
||||
int curlx_str_single(char **linep, char byte);
|
||||
~~~
|
||||
|
||||
Advance over a single character provided in `byte`. Return non-zero on error.
|
||||
|
||||
## `curlx_str_singlespace`
|
||||
|
||||
~~~c
|
||||
int curlx_str_singlespace(char **linep);
|
||||
~~~
|
||||
|
||||
Advance over a single ASCII space. Return non-zero on error.
|
||||
|
||||
## `curlx_str_passblanks`
|
||||
|
||||
~~~c
|
||||
void curlx_str_passblanks(char **linep);
|
||||
~~~
|
||||
|
||||
Advance over all spaces and tabs.
|
||||
|
||||
## `curlx_str_trimblanks`
|
||||
|
||||
~~~c
|
||||
void curlx_str_trimblanks(struct Curl_str *out);
|
||||
~~~
|
||||
|
||||
Trim off blanks (spaces and tabs) from the start and the end of the given
|
||||
string.
|
||||
|
||||
## `curlx_str_number`
|
||||
|
||||
~~~c
|
||||
int curlx_str_number(char **linep, curl_size_t *nump, size_t max);
|
||||
~~~
|
||||
|
||||
Get an unsigned decimal number not larger than `max`. Leading zeroes are just
|
||||
swallowed. Return non-zero on error. Returns error if there was not a single
|
||||
digit.
|
||||
|
||||
## `curlx_str_numblanks`
|
||||
|
||||
~~~c
|
||||
int curlx_str_numblanks(char **linep, curl_size_t *nump);
|
||||
~~~
|
||||
|
||||
Get an unsigned 63-bit decimal number. Leading blanks and zeroes are skipped.
|
||||
Returns non-zero on error. Returns error if there was not a single digit.
|
||||
|
||||
## `curlx_str_hex`
|
||||
|
||||
~~~c
|
||||
int curlx_str_hex(char **linep, curl_size_t *nump, size_t max);
|
||||
~~~
|
||||
|
||||
Get an unsigned hexadecimal number not larger than `max`. Leading zeroes are
|
||||
just swallowed. Return non-zero on error. Returns error if there was not a
|
||||
single digit. Does *not* handled `0x` prefix.
|
||||
|
||||
## `curlx_str_octal`
|
||||
|
||||
~~~c
|
||||
int curlx_str_octal(char **linep, curl_size_t *nump, size_t max);
|
||||
~~~
|
||||
|
||||
Get an unsigned octal number not larger than `max`. Leading zeroes are just
|
||||
swallowed. Return non-zero on error. Returns error if there was not a single
|
||||
digit.
|
||||
|
||||
## `curlx_str_newline`
|
||||
|
||||
~~~c
|
||||
int curlx_str_newline(char **linep);
|
||||
~~~
|
||||
|
||||
Check for a single CR or LF. Return non-zero on error */
|
||||
|
||||
## `curlx_str_casecompare`
|
||||
|
||||
~~~c
|
||||
int curlx_str_casecompare(struct Curl_str *str, const char *check);
|
||||
~~~
|
||||
|
||||
Returns true if the provided string in the `str` argument matches the `check`
|
||||
string case insensitively.
|
||||
|
||||
## `curlx_str_cmp`
|
||||
|
||||
~~~c
|
||||
int curlx_str_cmp(struct Curl_str *str, const char *check);
|
||||
~~~
|
||||
|
||||
Returns true if the provided string in the `str` argument matches the `check`
|
||||
string case sensitively. This is *not* the same return code as `strcmp`.
|
||||
|
||||
## `curlx_str_nudge`
|
||||
|
||||
~~~c
|
||||
int curlx_str_nudge(struct Curl_str *str, size_t num);
|
||||
~~~
|
||||
|
||||
Removes `num` bytes from the beginning (left) of the string kept in `str`. If
|
||||
`num` is larger than the string, it instead returns an error.
|
||||
163
curl-8.15.0/docs/internals/TLS-SESSIONS.md
Normal file
163
curl-8.15.0/docs/internals/TLS-SESSIONS.md
Normal file
@@ -0,0 +1,163 @@
|
||||
<!--
|
||||
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
|
||||
|
||||
SPDX-License-Identifier: curl
|
||||
-->
|
||||
|
||||
# TLS Sessions and Tickets
|
||||
|
||||
The TLS protocol offers methods of "resuming" a previous "session". A
|
||||
TLS "session" is a negotiated security context across a connection
|
||||
(which may be via TCP or UDP or other transports.)
|
||||
|
||||
By "resuming", the TLS protocol means that the security context from
|
||||
before can be fully or partially resurrected when the TLS client presents
|
||||
the proper crypto stuff to the server. This saves on the amount of
|
||||
TLS packets that need to be sent back and forth, reducing amount
|
||||
of data and even latency. In the case of QUIC, resumption may send
|
||||
application data without having seen any reply from the server, hence
|
||||
this is named 0-RTT data.
|
||||
|
||||
The exact mechanism of session tickets in TLSv1.2 (and earlier) and
|
||||
TLSv1.3 differs. TLSv1.2 tickets have several weaknesses (that can
|
||||
be exploited by attackers) which TLSv1.3 then fixed. See
|
||||
[Session Tickets in the real world](https://words.filippo.io/we-need-to-talk-about-session-tickets/)
|
||||
for an insight into this topic.
|
||||
|
||||
These difference between TLS protocol versions are reflected in curl's
|
||||
handling of session tickets. More below.
|
||||
|
||||
## curl's `ssl_peer_key`
|
||||
|
||||
In order to find a ticket from a previous TLS session, curl
|
||||
needs a name for TLS sessions that uniquely identifies the peer
|
||||
it talks to.
|
||||
|
||||
This name has to reflect also the various TLS parameters that can
|
||||
be configured in curl for a connection. We do not want to use
|
||||
a ticket from an different configuration. Example: when setting
|
||||
the maximum TLS version to 1.2, we do not want to reuse a ticket
|
||||
we got from a TLSv1.3 session, although we are talking to the
|
||||
same host.
|
||||
|
||||
Internally, we call this name a `ssl_peer_key`. It is a printable
|
||||
string that carries hostname and port and any non-default TLS
|
||||
parameters involved in the connection.
|
||||
|
||||
Examples:
|
||||
- `curl.se:443:CA-/etc/ssl/cert.pem:IMPL-GnuTLS/3.8.7` is a peer key for
|
||||
a connection to `curl.se:443` using `/etc/ssl/cert.pem` as CA
|
||||
trust anchors and GnuTLS/3.8.7 as TLS backend.
|
||||
- `curl.se:443:TLSVER-6-6:CA-/etc/ssl/cert.pem:IMPL-GnuTLS/3.8.7` is the
|
||||
same as the previous, except it is configured to use TLSv1.2 as
|
||||
min and max versions.
|
||||
|
||||
Different configurations produce different keys which is just what
|
||||
curl needs when handling SSL session tickets.
|
||||
|
||||
One important thing: peer keys do not contain confidential information. If you
|
||||
configure a client certificate or SRP authentication with username/password,
|
||||
these are not part of the peer key.
|
||||
|
||||
However, peer keys carry the hostnames you use curl for. They *do*
|
||||
leak the privacy of your communication. We recommend to *not* persist
|
||||
peer keys for this reason.
|
||||
|
||||
**Caveat**: The key may contain filenames or paths. It does not reflect the
|
||||
*contents* in the filesystem. If you change `/etc/ssl/cert.pem` and reuse a
|
||||
previous ticket, curl might trust a server which no longer has a root
|
||||
certificate in the file.
|
||||
|
||||
|
||||
## Session Cache Access
|
||||
|
||||
#### Lookups
|
||||
|
||||
When a new connection is being established, each SSL connection filter creates
|
||||
its own peer_key and calls into the cache. The cache then looks for a ticket
|
||||
with exactly this peer_key. Peer keys between proxy SSL filters and SSL
|
||||
filters talking through a tunnel differ, as they talk to different peers.
|
||||
|
||||
If the connection filter wants to use a client certificate or SRP
|
||||
authentication, the cache checks those as well. If the cache peer carries
|
||||
client cert or SRP auth, the connection filter must have those with the same
|
||||
values (and vice versa).
|
||||
|
||||
On a match, the connection filter gets the session ticket and feeds that to
|
||||
the TLS implementation which, on accepting it, tries to resume it for a
|
||||
shorter handshake. In addition, the filter gets the ALPN used before and the
|
||||
amount of 0-RTT data that the server announced to be willing to accept. The
|
||||
filter can then decide if it wants to attempt 0-RTT or not. (The ALPN is
|
||||
needed to know if the server speaks the protocol you want to send in 0-RTT. It
|
||||
makes no sense to send HTTP/2 requests to a server that only knows HTTP/1.1.)
|
||||
|
||||
#### Updates
|
||||
|
||||
When a new TLS session ticket is received by a filter, it adds it to the
|
||||
cache using its peer_key and SSL configuration. The cache looks for
|
||||
a matching entry and, should it find one, adds the ticket for this
|
||||
peer.
|
||||
|
||||
### Put, Take and Return
|
||||
|
||||
when a filter accesses the session cache, it *takes*
|
||||
a ticket from the cache, meaning a returned ticket is removed. The filter
|
||||
then configures its TLS backend and *returns* the ticket to the cache.
|
||||
|
||||
The cache needs to treat tickets from TLSv1.2 and 1.3 differently. 1.2 tickets
|
||||
should be reused, but 1.3 tickets SHOULD NOT (RFC 8446). The session cache
|
||||
simply drops 1.3 tickets when they are returned after use, but keeps a 1.2
|
||||
ticket.
|
||||
|
||||
When a ticket is *put* into the cache, there is also a difference. There
|
||||
can be several 1.3 tickets at the same time, but only a single 1.2 ticket.
|
||||
TLSv1.2 tickets replace any other. 1.3 tickets accumulate up to a max
|
||||
amount.
|
||||
|
||||
By having a "put/take/return" we reflect the 1.3 use case nicely. Two
|
||||
concurrent connections do not reuse the same ticket.
|
||||
|
||||
## Session Ticket Persistence
|
||||
|
||||
#### Privacy and Security
|
||||
|
||||
As mentioned above, ssl peer keys are not intended for storage in a file
|
||||
system. They clearly show which hosts the user talked to. This maybe "just"
|
||||
privacy relevant, but has security implications as an attacker might find
|
||||
worthy targets among your peer keys.
|
||||
|
||||
Also, we do not recommend to persist TLSv1.2 tickets.
|
||||
|
||||
### Salted Hashes
|
||||
|
||||
The TLS session cache offers an alternative to storing peer keys:
|
||||
it provides a salted SHA256 hash of the peer key for import and export.
|
||||
|
||||
#### Export
|
||||
|
||||
The salt is generated randomly for each peer key on export. The SHA256 makes
|
||||
sure that the peer key cannot be reversed and that a slightly different key
|
||||
still produces a different result.
|
||||
|
||||
This means an attacker cannot just "grep" a session file for a particular
|
||||
entry, e.g. if they want to know if you accessed a specific host. They *can*
|
||||
however compute the SHA256 hashes for all salts in the file and find a
|
||||
specific entry. They *cannot* find a hostname they do not know. They would
|
||||
have to brute force by guessing.
|
||||
|
||||
#### Import
|
||||
|
||||
When session tickets are imported from a file, curl only gets the salted
|
||||
hashes. The imported tickets belong to an *unknown* peer key.
|
||||
|
||||
When a connection filter tries to *take* a session ticket, it passes its peer
|
||||
key. This peer key initially does not match any tickets in the cache. The
|
||||
cache then checks all entries with unknown peer keys if the passed key matches
|
||||
their salted hash. If it does, the peer key is recovered and remembered at the
|
||||
cache entry.
|
||||
|
||||
This is a performance penalty in the order of "unknown" peer keys which
|
||||
diminishes over time when keys are rediscovered. Note that this also works for
|
||||
putting a new ticket into the cache: when no present entry matches, a new one
|
||||
with peer key is created. This peer key then no longer bears the cost of hash
|
||||
computes.
|
||||
128
curl-8.15.0/docs/internals/UINT_SETS.md
Normal file
128
curl-8.15.0/docs/internals/UINT_SETS.md
Normal file
@@ -0,0 +1,128 @@
|
||||
<!--
|
||||
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
|
||||
|
||||
SPDX-License-Identifier: curl
|
||||
-->
|
||||
|
||||
# Unsigned Int Sets
|
||||
|
||||
The multi handle tracks added easy handles via an unsigned int
|
||||
it calls an `mid`. There are four data structures for unsigned int
|
||||
optimized for the multi use case.
|
||||
|
||||
## `uint_tbl`
|
||||
|
||||
`uint_table`, implemented in `uint-table.[ch]` manages an array
|
||||
of `void *`. The unsigned int are the index into this array. It is
|
||||
created with a *capacity* which can be *resized*. The table assigns
|
||||
the index when a `void *` is *added*. It keeps track of the last
|
||||
assigned index and uses the next available larger index for a
|
||||
subsequent add. Reaching *capacity* it wraps around.
|
||||
|
||||
The table *can not* store `NULL` values. The largest possible index
|
||||
is `UINT_MAX - 1`.
|
||||
|
||||
The table is iterated over by asking for the *first* existing index,
|
||||
meaning the smallest number that has an entry, if the table is not
|
||||
empty. To get the *next* entry, one passes the index of the previous
|
||||
iteration step. It does not matter if the previous index is still
|
||||
in the table. Sample code for a table iteration would look like this:
|
||||
|
||||
```c
|
||||
unsigned int mid;
|
||||
void *entry;
|
||||
|
||||
if(Curl_uint_tbl_first(tbl, &mid, &entry)) {
|
||||
do {
|
||||
/* operate on entry with index mid */
|
||||
}
|
||||
while(Curl_uint_tbl_next(tbl, mid, &mid, &entry));
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
This iteration has the following properties:
|
||||
|
||||
* entries in the table can be added/removed safely.
|
||||
* all entries that are not removed during the iteration are visited.
|
||||
* the table may be resized to a larger capacity without affecting visited entries.
|
||||
* entries added with a larger index than the current are visited.
|
||||
|
||||
### Memory
|
||||
|
||||
For storing 1000 entries, the table would allocate one block of 8KB on a 64-bit system,
|
||||
plus the 2 pointers and 3 unsigned int in its base `struct uint_tbl`. A resize
|
||||
allocates a completely new pointer array, copy the existing entries and free the previous one.
|
||||
|
||||
### Performance
|
||||
|
||||
Lookups of entries are only an index into the array, O(1) with a tiny 1. Adding
|
||||
entries and iterations are more work:
|
||||
|
||||
1. adding an entry means "find the first free index larger than the previous assigned
|
||||
one". Worst case for this is a table with only a single free index where `capacity - 1`
|
||||
checks on `NULL` values would be performed, O(N). If the single free index is randomly
|
||||
distributed, this would be O(N/2).
|
||||
2. iterating a table scans for the first not `NULL` entry after the start index. This
|
||||
makes a complete iteration O(N) work.
|
||||
|
||||
In the multi use case, point 1 is remedied by growing the table so that a good chunk
|
||||
of free entries always exists.
|
||||
|
||||
Point 2 is less of an issue for a multi, since it does not really matter when the
|
||||
number of transfer is relatively small. A multi managing a larger set needs to operate
|
||||
event based anyway and table iterations rarely are needed.
|
||||
|
||||
For these reasons, the simple implementation was preferred. Should this become
|
||||
a concern, there are options like "free index lists" or, alternatively, an internal
|
||||
bitset that scans better.
|
||||
|
||||
## `uint_bset`
|
||||
|
||||
A bitset for unsigned integers, allowing fast add/remove operations. It is initialized
|
||||
with a *capacity*, meaning it can store only the numbers in the range `[0, capacity-1]`.
|
||||
It can be *resized* and safely *iterated*. `uint_bset` is designed to operate in combination with `uint_tbl`.
|
||||
|
||||
The bitset keeps an array of `curl_uint64_t`. The first array entry keeps the numbers 0 to 63, the
|
||||
second 64 to 127 and so on. A bitset with capacity 1024 would therefore allocate an array
|
||||
of 16 64-bit values (128 bytes). Operations for an unsigned int divide it by 64 for the array index and then check/set/clear the bit of the remainder.
|
||||
|
||||
Iterator works the same as with `uint_tbl`: ask the bitset for the *first* number present and
|
||||
then use that to get the *next* higher number present. Like the table, this safe for
|
||||
adds/removes and growing the set while iterating.
|
||||
|
||||
### Memory
|
||||
|
||||
The set only needs 1 bit for each possible number.
|
||||
A bitset for 40000 transfers occupies 5KB of memory.
|
||||
|
||||
## Performance
|
||||
|
||||
Operations for add/remove/check are O(1). Iteration needs to scan for the next bit set. The
|
||||
number of scans is small (see memory footprint) and, for checking bits, many compilers
|
||||
offer primitives for special CPU instructions.
|
||||
|
||||
## `uint_spbset`
|
||||
|
||||
While the memory footprint of `uint_bset` is good, it still needs 5KB to store the single number 40000. This
|
||||
is not optimal when many are needed. For example, in event based processing, each socket needs to
|
||||
keep track of the transfers involved. There are many sockets potentially, but each one mostly tracks
|
||||
a single transfer or few (on HTTP/2 connection borderline up to 100).
|
||||
|
||||
For such uses cases, the `uint_spbset` is intended: track a small number of unsigned int, potentially
|
||||
rather "close" together. It keeps "chunks" with an offset and has no capacity limit.
|
||||
|
||||
Example: adding the number 40000 to an empty sparse bitset would have one chunk with offset 39936, keeping
|
||||
track of the numbers 39936 to 40192 (a chunk has 4 64-bit values). The numbers in that range can be handled
|
||||
without further allocations.
|
||||
|
||||
The worst case is then storing 100 numbers that lie in separate intervals. Then 100 chunks
|
||||
would need to be allocated and linked, resulting in overall 4 KB of memory used.
|
||||
|
||||
Iterating a sparse bitset works the same as for bitset and table.
|
||||
|
||||
## `uint_hash`
|
||||
|
||||
At last, there are places in libcurl such as the HTTP/2 and HTTP/3 protocol implementations that need
|
||||
to store their own data related to a transfer. `uint_hash` allows then to associate an unsigned int,
|
||||
e.g. the transfer's `mid`, to their own data.
|
||||
134
curl-8.15.0/docs/internals/WEBSOCKET.md
Normal file
134
curl-8.15.0/docs/internals/WEBSOCKET.md
Normal file
@@ -0,0 +1,134 @@
|
||||
<!--
|
||||
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
|
||||
|
||||
SPDX-License-Identifier: curl
|
||||
-->
|
||||
|
||||
# WebSocket in curl
|
||||
|
||||
## URL
|
||||
|
||||
WebSocket communication with libcurl is done by setting up a transfer to a URL
|
||||
using the `ws://` or `wss://` URL schemes. The latter one being the secure
|
||||
version done over HTTPS.
|
||||
|
||||
When using `wss://` to do WebSocket over HTTPS, the standard TLS and HTTPS
|
||||
options are acknowledged for the CA, verification of server certificate etc.
|
||||
|
||||
WebSocket communication is done by upgrading a connection from either HTTP or
|
||||
HTTPS. When given a WebSocket URL to work with, libcurl considers it a
|
||||
transfer failure if the upgrade procedure fails. This means that a plain HTTP
|
||||
200 response code is considered an error for this work.
|
||||
|
||||
## API
|
||||
|
||||
The WebSocket API is described in the individual man pages for the new API.
|
||||
|
||||
WebSocket with libcurl can be done two ways.
|
||||
|
||||
1. Get the WebSocket frames from the server sent to the write callback. You
|
||||
can then respond with `curl_ws_send()` from within the callback (or outside
|
||||
of it).
|
||||
|
||||
2. Set `CURLOPT_CONNECT_ONLY` to 2L (new for WebSocket), which makes libcurl
|
||||
do an HTTP GET + `Upgrade:` request plus response in the
|
||||
`curl_easy_perform()` call before it returns and then you can use
|
||||
`curl_ws_recv()` and `curl_ws_send()` to receive and send WebSocket frames
|
||||
from and to the server.
|
||||
|
||||
The new options to `curl_easy_setopt()`:
|
||||
|
||||
`CURLOPT_WS_OPTIONS` - to control specific behavior. `CURLWS_RAW_MODE` makes
|
||||
libcurl provide all WebSocket traffic raw in the callback. `CURLWS_NOAUTOPONG`
|
||||
disables automatic `PONG` replies.
|
||||
|
||||
The new function calls:
|
||||
|
||||
`curl_ws_recv()` - receive a WebSocket frame
|
||||
|
||||
`curl_ws_send()` - send a WebSocket frame
|
||||
|
||||
`curl_ws_meta()` - return WebSocket metadata within a write callback
|
||||
|
||||
## Max frame size
|
||||
|
||||
The current implementation only supports frame sizes up to a max (64K right
|
||||
now). This is because the API delivers full frames and it then cannot manage
|
||||
the full 2^63 bytes size.
|
||||
|
||||
If we decide we need to support (much) larger frames than 64K, we need to
|
||||
adjust the API accordingly to be able to deliver partial frames in both
|
||||
directions.
|
||||
|
||||
## Errors
|
||||
|
||||
If the given WebSocket URL (using `ws://` or `wss://`) fails to get upgraded
|
||||
via a 101 response code and instead gets another response code back from the
|
||||
HTTP server - the transfer returns `CURLE_HTTP_RETURNED_ERROR` for that
|
||||
transfer. Note then that even 2xx response codes are then considered error
|
||||
since it failed to provide a WebSocket transfer.
|
||||
|
||||
## Test suite
|
||||
|
||||
I looked for an existing small WebSocket server implementation with maximum
|
||||
flexibility to dissect and cram into the test suite but I ended up deciding
|
||||
that extending the existing test suite server sws to deal with WebSocket
|
||||
might be the better way.
|
||||
|
||||
- This server is already integrated and working in the test suite
|
||||
|
||||
- We want maximum control and ability to generate broken protocol and negative
|
||||
tests as well. A dumber and simpler TCP server could then be easier to
|
||||
massage into this than a "proper" WebSocket server.
|
||||
|
||||
## Command line tool WebSocket
|
||||
|
||||
The plan is to make curl do WebSocket similar to telnet/nc. That part of the
|
||||
work has not been started.
|
||||
|
||||
Ideas:
|
||||
|
||||
- Read stdin and send off as messages. Consider newline as end of fragment.
|
||||
(default to text? offer option to set binary)
|
||||
- Respond to PINGs automatically
|
||||
- Issue PINGs at some default interval (option to switch off/change interval?)
|
||||
- Allow `-d` to specify (initial) data to send (should the format allow for
|
||||
multiple separate frames?)
|
||||
- Exit after N messages received, where N can be zero.
|
||||
|
||||
## Future work
|
||||
|
||||
- Verify the Sec-WebSocket-Accept response. It requires a sha-1 function.
|
||||
- Verify Sec-WebSocket-Extensions and Sec-WebSocket-Protocol in the response
|
||||
- Consider a `curl_ws_poll()`
|
||||
- Make sure WebSocket code paths are fuzzed
|
||||
- Add client-side PING interval
|
||||
- Provide option to disable PING-PONG automation
|
||||
- Support compression (`CURLWS_COMPRESS`)
|
||||
|
||||
## Why not libWebSocket
|
||||
|
||||
libWebSocket is said to be a solid, fast and efficient WebSocket library with
|
||||
a vast amount of users. My plan was originally to build upon it to skip having
|
||||
to implement the low level parts of WebSocket myself.
|
||||
|
||||
Here are the reasons why I have decided to move forward with WebSocket in
|
||||
curl **without using libWebSocket**:
|
||||
|
||||
- doxygen generated docs only makes them hard to navigate. No tutorial, no
|
||||
clearly written explanatory pages for specific functions.
|
||||
|
||||
- seems (too) tightly integrated with a specific TLS library, while we want to
|
||||
support WebSocket with whatever TLS library libcurl was already made to
|
||||
work with.
|
||||
|
||||
- seems (too) tightly integrated with event libraries
|
||||
|
||||
- the references to threads and thread-pools in code and APIs indicate too
|
||||
much logic for our purposes
|
||||
|
||||
- "bloated" - it is a *huge* library that is actually more lines of code than
|
||||
libcurl itself
|
||||
|
||||
- WebSocket is a fairly simple protocol on the network/framing layer so
|
||||
making a homegrown handling of it should be fine
|
||||
Reference in New Issue
Block a user