96749554fd
This is the first version of this new document, detailing the seven perhaps most important internal structs in libcurl source code: 1.1 SessionHandle 1.2 connectdata 1.3 Curl_multi 1.4 Curl_handler 1.5 conncache 1.6 Curl_share 1.7 CookieInfo
246 lines
10 KiB
Plaintext
246 lines
10 KiB
Plaintext
_ _ ____ _
|
|
___| | | | _ \| |
|
|
/ __| | | | |_) | |
|
|
| (__| |_| | _ <| |___
|
|
\___|\___/|_| \_\_____|
|
|
|
|
Structs in libcurl
|
|
|
|
This document should cover 7.32.0 pretty accurately, but will make sense even
|
|
for older and later versions as things don't change drastically that often.
|
|
|
|
1. The main structs in libcurl
|
|
1.1 SessionHandle
|
|
1.2 connectdata
|
|
1.3 Curl_multi
|
|
1.4 Curl_handler
|
|
1.5 conncache
|
|
1.6 Curl_share
|
|
1.7 CookieInfo
|
|
|
|
==============================================================================
|
|
|
|
1. The main structs in libcurl
|
|
|
|
1.1 SessionHandle
|
|
|
|
The SessionHandle handle struct is the one returned to the outside in the
|
|
external API as a "CURL *". This is usually known as an easy handle in API
|
|
documentations and examples.
|
|
|
|
Information and state that is related to the actual connection is in the
|
|
'connectdata' struct. When a transfer is about to be made, libcurl will
|
|
either create a new connection or re-use an existing one. The particular
|
|
connectdata that is used by this handle is pointed out by
|
|
SessionHandle->easy_conn.
|
|
|
|
Data and information that regard this particular single transfer is put in
|
|
the SingleRequest sub-struct.
|
|
|
|
When the SessionHandle struct is added to a multi handle, as it must be in
|
|
order to do any transfer, the ->multi member will point to the Curl_multi
|
|
struct it belongs to. The ->prev and ->next members will then be used by the
|
|
multi code to keep a linked list of SessionHandle structs that are added to
|
|
that same multi handle. libcurl always uses multi so ->multi *will* point to
|
|
a Curl_multi when a transfer is in progress.
|
|
|
|
->mstate is the multi state of this particular SessionHandle. When
|
|
multi_runsingle() is called, it will act on this handle according to which
|
|
state it is in. The mstate is also what tells which sockets to return for a
|
|
speicific SessionHandle when curl_multi_fdset() is called etc.
|
|
|
|
The libcurl source code generally use the name 'data' for the variable that
|
|
points to the SessionHandle.
|
|
|
|
|
|
1.2 connectdata
|
|
|
|
A general idea in libcurl is to keep connections around in a connection
|
|
"cache" after they have been used in case they will be used again and then
|
|
re-use an existing one instead of creating a new as it creates a significant
|
|
performance boost.
|
|
|
|
Each 'connectdata' identifies a single physical conncetion to a server. If
|
|
the connection can't be kept alive, the connection will be closed after use
|
|
and then this struct can be removed from the cache and freed.
|
|
|
|
Thus, the same SessionHandle can be used multiple times and each time select
|
|
another connectdata struct to use for the connection. Keep this in mind, as
|
|
it is then important to consider if options or choices are based on the
|
|
connection or the SessionHandle.
|
|
|
|
Functions in libcurl will assume that connectdata->data points to the
|
|
SessionHandle that uses this connection.
|
|
|
|
As a special complexity, some protocols supported by libcurl require a
|
|
special disconnect procedure that is more than just shutting down the
|
|
socket. It can involve sending one or more commands to the server before
|
|
doing so. Since connections are kept in the connection cache after use, the
|
|
original SessionHandle may no longer be around when the time comes to shut
|
|
down a particular connection. For this purpose, libcurl holds a special
|
|
dummy 'closure_handle' SessionHandle in the Curl_multi struct to
|
|
|
|
FTP uses two TCP connections for a typical transfer but it keeps both in
|
|
this single struct and thus can be considered a single connection for most
|
|
internal concerns.
|
|
|
|
The libcurl source code generally use the name 'conn' for the variable that
|
|
points to the connectdata.
|
|
|
|
|
|
1.3 Curl_multi
|
|
|
|
Internally, the easy interface is implemented as a wrapper around multi
|
|
interface functions. This makes everything multi interface.
|
|
|
|
Curl_multi is the multi handle struct exposed as "CURLM *" in external APIs.
|
|
|
|
This struct holds a list of SessionHandle structs that have been added to
|
|
this handle with curl_multi_add_handle(). The start of the list is ->easyp
|
|
and ->num_easy is a counter of added SessionHandles.
|
|
|
|
->msglist is a linked list of messages to send back when
|
|
curl_multi_info_read() is called. Basically a node is added to that list
|
|
when an individual SessionHandle's transfer has completed.
|
|
|
|
->hostcache points to the name cache. It is a hash table for looking up name
|
|
to IP. The nodes have a limited life time in there and this cache is meant
|
|
to reduce the time for when the same name is wanted within a short period of
|
|
time.
|
|
|
|
->timetree points to a tree of SessionHandles, sorted by the remaining time
|
|
until it should be checked - normally some sort of timeout. Each
|
|
SessionHandle has one node in the tree.
|
|
|
|
->sockhash is a hash table to allow fast lookups of socket descriptor to
|
|
which SessionHandle that uses that descriptor. This is necessary for the
|
|
multi_socket API.
|
|
|
|
->conn_cache points to the connection cache. It keeps track of all
|
|
connections that are kept after use. The cache has a maximum size.
|
|
|
|
->closure_handle is described in the 'connectdata' section.
|
|
|
|
The libcurl source code generally use the name 'multi' for the variable that
|
|
points to the Curl_multi struct.
|
|
|
|
|
|
1.4 Curl_handler
|
|
|
|
Each unique protocol that is supported by libcurl needs to provide at least
|
|
one Curl_handler struct. It defines what the protocol is called and what
|
|
functions the main code should call to deal with protocol specific issues.
|
|
In general, there's a source file named [protocol].c in which there's a
|
|
"struct Curl_handler Curl_handler_[protocol]" declared. In url.c there's
|
|
then the main array with all individual Curl_handler structs pointed to from
|
|
a single array which is scanned through when a URL is given to libcurl to
|
|
work with.
|
|
|
|
->scheme is the URL scheme name, usually spelled out in uppercase. That's
|
|
"HTTP" or "FTP" etc. SSL versions of the protcol need its own Curl_handler
|
|
setup so HTTPS separate from HTTP.
|
|
|
|
->setup_connection is called to allow the protocol code to allocate protocol
|
|
specific data that then gets associated with that SessionHandle for the rest
|
|
of this transfer. It gets freed again at the end of the transfer. It will be
|
|
called before the 'connectdata' for the transfer has been selected/created.
|
|
Most protocols will allocate its private 'struct [PROTOCOL]' here and assign
|
|
SessionHandle->req.protop to point to it.
|
|
|
|
->connect_it allows a protocol to do some specific actions after the TCP
|
|
connect is done, that can still be considered part of the connection phase.
|
|
|
|
Some protocols will alter the connectdata->recv[] and connectdata->send[]
|
|
function pointers in this function.
|
|
|
|
->connecting is similarly a function that keeps getting called as long as the
|
|
protocol considers itself still in the connecting phase.
|
|
|
|
->do_it is the function called to issue the transfer request. What we call
|
|
the DO action internally. If the DO is not enough and things need to be kept
|
|
getting done for the entier DO sequence to complete, ->doing is then usually
|
|
also provided. Each protocol that needs to do multiple commands or similar
|
|
for do/doing need to implement their own state machines (see SCP, SFTP,
|
|
FTP). Some protocols (only FTP and only due to historical reasons) has a
|
|
separate piece of the DO state called DO_MORE.
|
|
|
|
->doing keeps getting called while issudeing the transfer request command(s)
|
|
|
|
->done gets called when the transfer is complete and DONE. That's after the
|
|
main data has been transferred.
|
|
|
|
->do_more gets called doring the DO_MORE state. The FTP protocol uses this
|
|
state when setting up the second connection.
|
|
|
|
->proto_getsock
|
|
->doing_getsock
|
|
->domore_getsock
|
|
->perform_getsock
|
|
Functions that return socket information. Which socket(s) to wait for which
|
|
action(s) during the particular multi state.
|
|
|
|
->disconnect is called immediately before the TCP connection is shutdown.
|
|
|
|
->readwrite gets called during transfer to allow the protocol to do extra
|
|
reads/writes
|
|
|
|
->defport is the default report TCP or UDP port this protocol uses
|
|
|
|
->protocol is one or more bits in the CURLPROTO_* set. The SSL versions have
|
|
their "base" protocol set and then the SSL variation. Like "HTTP|HTTPS".
|
|
|
|
->flags is a bitmask with additional information about the protocol that will
|
|
make it get treated differently by the generic engine:
|
|
|
|
PROTOPT_SSL - will make it connect and negotiate SSL
|
|
|
|
PROTOPT_DUAL - this protocol uses two connections
|
|
|
|
PROTOPT_CLOSEACTION - this protocol has actions to do before closing the
|
|
connection. This flag is no longer used by code, yet still set for a bunch
|
|
protocol handlers.
|
|
|
|
PROTOPT_DIRLOCK - "direction lock". The SSH protocols set this bit to
|
|
limit which "direction" of socket actions that the main engine will
|
|
concern itself about.
|
|
|
|
PROTOPT_NONETWORK - a protocol that doesn't use network (read file:)
|
|
|
|
PROTOPT_NEEDSPWD - this protocol needs a password and will use a default
|
|
one unless one is provided
|
|
|
|
PROTOPT_NOURLQUERY - this protocol can't handle a query part on the URL
|
|
(?foo=bar)
|
|
|
|
|
|
1.5 conncache
|
|
|
|
Is a hash table with connections for later re-use. Each SessionHandle has
|
|
a pointer to its connection cache. Each multi handle sets up a connection
|
|
cache that all added SessionHandles share by default.
|
|
|
|
|
|
1.6 Curl_share
|
|
|
|
The libcurl share API allocates a Curl_share struct, exposed to the external
|
|
API as "CURLSH *".
|
|
|
|
The idea is that the struct can have a set of own versions of caches and
|
|
pools and then by providing this struct in the CURLOPT_SHARE option, those
|
|
specific SessionHandles will use the caches/pools that this share handle
|
|
holds.
|
|
|
|
Then individual SessionHandle structs can be made to share specific things
|
|
that they otherwise wouldn't, such as cookies.
|
|
|
|
The Curl_share struct can currently hold cookies, DNS cache and the SSL
|
|
session cache.
|
|
|
|
|
|
1.7 CookieInfo
|
|
|
|
This is the main cookie struct. It holds all known cookies and related
|
|
information. Each SessionHandle has its own private CookieInfo even when
|
|
they are added to a multi handle. They can be made to share cookies by using
|
|
the share API.
|