494 lines
		
	
	
		
			21 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			494 lines
		
	
	
		
			21 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
|                                   _   _ ____  _
 | |
|                               ___| | | |  _ \| |
 | |
|                              / __| | | | |_) | |
 | |
|                             | (__| |_| |  _ <| |___
 | |
|                              \___|\___/|_| \_\_____|
 | |
| 
 | |
| INTERNALS
 | |
| 
 | |
|  The project is split in two. The library and the client. The client part uses
 | |
|  the library, but the library is designed to allow other applications to use
 | |
|  it.
 | |
| 
 | |
|  The largest amount of code and complexity is in the library part.
 | |
| 
 | |
| GIT
 | |
| ===
 | |
|  All changes to the sources are committed to the git repository as soon as
 | |
|  they're somewhat verified to work. Changes shall be committed as independently
 | |
|  as possible so that individual changes can be easier spotted and tracked
 | |
|  afterwards.
 | |
| 
 | |
|  Tagging shall be used extensively, and by the time we release new archives we
 | |
|  should tag the sources with a name similar to the released version number.
 | |
| 
 | |
| Portability
 | |
| ===========
 | |
| 
 | |
|  We write curl and libcurl to compile with C89 compilers.  On 32bit and up
 | |
|  machines. Most of libcurl assumes more or less POSIX compliance but that's
 | |
|  not a requirement.
 | |
| 
 | |
|  We write libcurl to build and work with lots of third party tools, and we
 | |
|  want it to remain functional and buildable with these and later versions
 | |
|  (older versions may still work but is not what we work hard to maintain):
 | |
| 
 | |
|  OpenSSL      0.9.6
 | |
|  GnuTLS       1.2
 | |
|  zlib         1.1.4
 | |
|  libssh2      0.16
 | |
|  c-ares       1.6.0
 | |
|  libidn       0.4.1
 | |
|  *yassl       1.4.0 (http://curl.haxx.se/mail/lib-2008-02/0093.html)
 | |
|  openldap     2.0
 | |
|  MIT krb5 lib 1.2.4
 | |
|  qsossl       V5R2M0
 | |
|  NSS          3.11.x
 | |
|  axTLS        1.2.7
 | |
|  Heimdal      ?
 | |
| 
 | |
|  * = only partly functional, but that's due to bugs in the third party lib, not
 | |
|      because of libcurl code
 | |
| 
 | |
|  On systems where configure runs, we aim at working on them all - if they have
 | |
|  a suitable C compiler. On systems that don't run configure, we strive to keep
 | |
|  curl running fine on:
 | |
| 
 | |
|  Windows      98
 | |
|  AS/400       V5R2M0
 | |
|  Symbian      9.1
 | |
|  Windows CE   ?
 | |
|  TPF          ?
 | |
| 
 | |
|  When writing code (mostly for generating stuff included in release tarballs)
 | |
|  we use a few "build tools" and we make sure that we remain functional with
 | |
|  these versions:
 | |
| 
 | |
|  GNU Libtool  1.4.2
 | |
|  GNU Autoconf 2.57
 | |
|  GNU Automake 1.7 (we currently avoid 1.10 due to Solaris-related bugs)
 | |
|  GNU M4       1.4
 | |
|  perl         4
 | |
|  roffit       0.5
 | |
|  groff        ? (any version that supports "groff -Tps -man [in] [out]")
 | |
|  ps2pdf (gs)  ?
 | |
| 
 | |
| Windows vs Unix
 | |
| ===============
 | |
| 
 | |
|  There are a few differences in how to program curl the unix way compared to
 | |
|  the Windows way. The four perhaps most notable details are:
 | |
| 
 | |
|  1. Different function names for socket operations.
 | |
| 
 | |
|    In curl, this is solved with defines and macros, so that the source looks
 | |
|    the same at all places except for the header file that defines them. The
 | |
|    macros in use are sclose(), sread() and swrite().
 | |
| 
 | |
|  2. Windows requires a couple of init calls for the socket stuff.
 | |
| 
 | |
|    That's taken care of by the curl_global_init() call, but if other libs also
 | |
|    do it etc there might be reasons for applications to alter that behaviour.
 | |
| 
 | |
|  3. The file descriptors for network communication and file operations are
 | |
|     not easily interchangeable as in unix.
 | |
| 
 | |
|    We avoid this by not trying any funny tricks on file descriptors.
 | |
| 
 | |
|  4. When writing data to stdout, Windows makes end-of-lines the DOS way, thus
 | |
|     destroying binary data, although you do want that conversion if it is
 | |
|     text coming through... (sigh)
 | |
| 
 | |
|    We set stdout to binary under windows
 | |
| 
 | |
|  Inside the source code, We make an effort to avoid '#ifdef [Your OS]'. All
 | |
|  conditionals that deal with features *should* instead be in the format
 | |
|  '#ifdef HAVE_THAT_WEIRD_FUNCTION'. Since Windows can't run configure scripts,
 | |
|  we maintain two curl_config-win32.h files (one in lib/ and one in src/) that
 | |
|  are supposed to look exactly as a curl_config.h file would have looked like on
 | |
|  a Windows machine!
 | |
| 
 | |
|  Generally speaking: always remember that this will be compiled on dozens of
 | |
|  operating systems. Don't walk on the edge.
 | |
| 
 | |
| Library
 | |
| =======
 | |
| 
 | |
|  There are plenty of entry points to the library, namely each publicly defined
 | |
|  function that libcurl offers to applications. All of those functions are
 | |
|  rather small and easy-to-follow. All the ones prefixed with 'curl_easy' are
 | |
|  put in the lib/easy.c file.
 | |
| 
 | |
|  curl_global_init_() and curl_global_cleanup() should be called by the
 | |
|  application to initialize and clean up global stuff in the library. As of
 | |
|  today, it can handle the global SSL initing if SSL is enabled and it can init
 | |
|  the socket layer on windows machines. libcurl itself has no "global" scope.
 | |
| 
 | |
|  All printf()-style functions use the supplied clones in lib/mprintf.c. This
 | |
|  makes sure we stay absolutely platform independent.
 | |
| 
 | |
|  curl_easy_init() allocates an internal struct and makes some initializations.
 | |
|  The returned handle does not reveal internals. This is the 'SessionHandle'
 | |
|  struct which works as an "anchor" struct for all curl_easy functions. All
 | |
|  connections performed will get connect-specific data allocated that should be
 | |
|  used for things related to particular connections/requests.
 | |
| 
 | |
|  curl_easy_setopt() takes three arguments, where the option stuff must be
 | |
|  passed in pairs: the parameter-ID and the parameter-value. The list of
 | |
|  options is documented in the man page. This function mainly sets things in
 | |
|  the 'SessionHandle' struct.
 | |
| 
 | |
|  curl_easy_perform() does a whole lot of things:
 | |
| 
 | |
|  It starts off in the lib/easy.c file by calling Curl_perform() and the main
 | |
|  work then continues in lib/url.c. The flow continues with a call to
 | |
|  Curl_connect() to connect to the remote site.
 | |
| 
 | |
|  o Curl_connect()
 | |
| 
 | |
|    ... analyzes the URL, it separates the different components and connects to
 | |
|    the remote host. This may involve using a proxy and/or using SSL. The
 | |
|    Curl_resolv() function in lib/hostip.c is used for looking up host names
 | |
|    (it does then use the proper underlying method, which may vary between
 | |
|    platforms and builds).
 | |
| 
 | |
|    When Curl_connect is done, we are connected to the remote site. Then it is
 | |
|    time to tell the server to get a document/file. Curl_do() arranges this.
 | |
| 
 | |
|    This function makes sure there's an allocated and initiated 'connectdata'
 | |
|    struct that is used for this particular connection only (although there may
 | |
|    be several requests performed on the same connect). A bunch of things are
 | |
|    inited/inherited from the SessionHandle struct.
 | |
| 
 | |
|  o Curl_do()
 | |
| 
 | |
|    Curl_do() makes sure the proper protocol-specific function is called. The
 | |
|    functions are named after the protocols they handle. Curl_ftp(),
 | |
|    Curl_http(), Curl_dict(), etc. They all reside in their respective files
 | |
|    (ftp.c, http.c and dict.c). HTTPS is handled by Curl_http() and FTPS by
 | |
|    Curl_ftp().
 | |
| 
 | |
|    The protocol-specific functions of course deal with protocol-specific
 | |
|    negotiations and setup. They have access to the Curl_sendf() (from
 | |
|    lib/sendf.c) function to send printf-style formatted data to the remote
 | |
|    host and when they're ready to make the actual file transfer they call the
 | |
|    Curl_Transfer() function (in lib/transfer.c) to setup the transfer and
 | |
|    returns.
 | |
| 
 | |
|    If this DO function fails and the connection is being re-used, libcurl will
 | |
|    then close this connection, setup a new connection and re-issue the DO
 | |
|    request on that. This is because there is no way to be perfectly sure that
 | |
|    we have discovered a dead connection before the DO function and thus we
 | |
|    might wrongly be re-using a connection that was closed by the remote peer.
 | |
| 
 | |
|    Some time during the DO function, the Curl_setup_transfer() function must
 | |
|    be called with some basic info about the upcoming transfer: what socket(s)
 | |
|    to read/write and the expected file transfer sizes (if known).
 | |
| 
 | |
|  o Transfer()
 | |
| 
 | |
|    Curl_perform() then calls Transfer() in lib/transfer.c that performs the
 | |
|    entire file transfer.
 | |
| 
 | |
|    During transfer, the progress functions in lib/progress.c are called at a
 | |
|    frequent interval (or at the user's choice, a specified callback might get
 | |
|    called). The speedcheck functions in lib/speedcheck.c are also used to
 | |
|    verify that the transfer is as fast as required.
 | |
| 
 | |
|  o Curl_done()
 | |
| 
 | |
|    Called after a transfer is done. This function takes care of everything
 | |
|    that has to be done after a transfer. This function attempts to leave
 | |
|    matters in a state so that Curl_do() should be possible to call again on
 | |
|    the same connection (in a persistent connection case). It might also soon
 | |
|    be closed with Curl_disconnect().
 | |
| 
 | |
|  o Curl_disconnect()
 | |
| 
 | |
|    When doing normal connections and transfers, no one ever tries to close any
 | |
|    connections so this is not normally called when curl_easy_perform() is
 | |
|    used. This function is only used when we are certain that no more transfers
 | |
|    is going to be made on the connection. It can be also closed by force, or
 | |
|    it can be called to make sure that libcurl doesn't keep too many
 | |
|    connections alive at the same time (there's a default amount of 5 but that
 | |
|    can be changed with the CURLOPT_MAXCONNECTS option).
 | |
| 
 | |
|    This function cleans up all resources that are associated with a single
 | |
|    connection.
 | |
| 
 | |
|  Curl_perform() is the function that does the main "connect - do - transfer -
 | |
|  done" loop. It loops if there's a Location: to follow.
 | |
| 
 | |
|  When completed, the curl_easy_cleanup() should be called to free up used
 | |
|  resources. It runs Curl_disconnect() on all open connectons.
 | |
| 
 | |
|  A quick roundup on internal function sequences (many of these call
 | |
|  protocol-specific function-pointers):
 | |
| 
 | |
|   Curl_connect - connects to a remote site and does initial connect fluff
 | |
|    This also checks for an existing connection to the requested site and uses
 | |
|    that one if it is possible.
 | |
| 
 | |
|    Curl_do - starts a transfer
 | |
|     Curl_handler::do_it() - transfers data
 | |
|    Curl_done - ends a transfer
 | |
| 
 | |
|   Curl_disconnect - disconnects from a remote site. This is called when the
 | |
|    disconnect is really requested, which doesn't necessarily have to be
 | |
|    exactly after curl_done in case we want to keep the connection open for
 | |
|    a while.
 | |
| 
 | |
|  HTTP(S)
 | |
| 
 | |
|  HTTP offers a lot and is the protocol in curl that uses the most lines of
 | |
|  code. There is a special file (lib/formdata.c) that offers all the multipart
 | |
|  post functions.
 | |
| 
 | |
|  base64-functions for user+password stuff (and more) is in (lib/base64.c) and
 | |
|  all functions for parsing and sending cookies are found in (lib/cookie.c).
 | |
| 
 | |
|  HTTPS uses in almost every means the same procedure as HTTP, with only two
 | |
|  exceptions: the connect procedure is different and the function used to read
 | |
|  or write from the socket is different, although the latter fact is hidden in
 | |
|  the source by the use of Curl_read() for reading and Curl_write() for writing
 | |
|  data to the remote server.
 | |
| 
 | |
|  http_chunks.c contains functions that understands HTTP 1.1 chunked transfer
 | |
|  encoding.
 | |
| 
 | |
|  An interesting detail with the HTTP(S) request, is the Curl_add_buffer() series of
 | |
|  functions we use. They append data to one single buffer, and when the
 | |
|  building is done the entire request is sent off in one single write. This is
 | |
|  done this way to overcome problems with flawed firewalls and lame servers.
 | |
| 
 | |
|  FTP
 | |
| 
 | |
|  The Curl_if2ip() function can be used for getting the IP number of a
 | |
|  specified network interface, and it resides in lib/if2ip.c.
 | |
| 
 | |
|  Curl_ftpsendf() is used for sending FTP commands to the remote server. It was
 | |
|  made a separate function to prevent us programmers from forgetting that they
 | |
|  must be CRLF terminated. They must also be sent in one single write() to make
 | |
|  firewalls and similar happy.
 | |
| 
 | |
|  Kerberos
 | |
| 
 | |
|  The kerberos support is mainly in lib/krb4.c and lib/security.c.
 | |
| 
 | |
|  TELNET
 | |
| 
 | |
|  Telnet is implemented in lib/telnet.c.
 | |
| 
 | |
|  FILE
 | |
| 
 | |
|  The file:// protocol is dealt with in lib/file.c.
 | |
| 
 | |
|  LDAP
 | |
| 
 | |
|  Everything LDAP is in lib/ldap.c.
 | |
| 
 | |
|  GENERAL
 | |
| 
 | |
|  URL encoding and decoding, called escaping and unescaping in the source code,
 | |
|  is found in lib/escape.c.
 | |
| 
 | |
|  While transferring data in Transfer() a few functions might get used.
 | |
|  curl_getdate() in lib/parsedate.c is for HTTP date comparisons (and more).
 | |
| 
 | |
|  lib/getenv.c offers curl_getenv() which is for reading environment variables
 | |
|  in a neat platform independent way. That's used in the client, but also in
 | |
|  lib/url.c when checking the proxy environment variables. Note that contrary
 | |
|  to the normal unix getenv(), this returns an allocated buffer that must be
 | |
|  free()ed after use.
 | |
| 
 | |
|  lib/netrc.c holds the .netrc parser
 | |
| 
 | |
|  lib/timeval.c features replacement functions for systems that don't have
 | |
|  gettimeofday() and a few support functions for timeval conversions.
 | |
| 
 | |
|  A function named curl_version() that returns the full curl version string is
 | |
|  found in lib/version.c.
 | |
| 
 | |
| Persistent Connections
 | |
| ======================
 | |
| 
 | |
|  The persistent connection support in libcurl requires some considerations on
 | |
|  how to do things inside of the library.
 | |
| 
 | |
|  o The 'SessionHandle' struct returned in the curl_easy_init() call must never
 | |
|    hold connection-oriented data. It is meant to hold the root data as well as
 | |
|    all the options etc that the library-user may choose.
 | |
|  o The 'SessionHandle' struct holds the "connection cache" (an array of
 | |
|    pointers to 'connectdata' structs). There's one connectdata struct
 | |
|    allocated for each connection that libcurl knows about. Note that when you
 | |
|    use the multi interface, the multi handle will hold the connection cache
 | |
|    and not the particular easy handle. This of course to allow all easy handles
 | |
|    in a multi stack to be able to share and re-use connections.
 | |
|  o This enables the 'curl handle' to be reused on subsequent transfers.
 | |
|  o When we are about to perform a transfer with curl_easy_perform(), we first
 | |
|    check for an already existing connection in the cache that we can use,
 | |
|    otherwise we create a new one and add to the cache. If the cache is full
 | |
|    already when we add a new connection, we close one of the present ones. We
 | |
|    select which one to close dependent on the close policy that may have been
 | |
|    previously set.
 | |
|  o When the transfer operation is complete, we try to leave the connection
 | |
|    open. Particular options may tell us not to, and protocols may signal
 | |
|    closure on connections and then we don't keep it open of course.
 | |
|  o When curl_easy_cleanup() is called, we close all still opened connections,
 | |
|    unless of course the multi interface "owns" the connections.
 | |
| 
 | |
|  You do realize that the curl handle must be re-used in order for the
 | |
|  persistent connections to work.
 | |
| 
 | |
| multi interface/non-blocking
 | |
| ============================
 | |
| 
 | |
|  We make an effort to provide a non-blocking interface to the library, the
 | |
|  multi interface. To make that interface work as good as possible, no
 | |
|  low-level functions within libcurl must be written to work in a blocking
 | |
|  manner.
 | |
| 
 | |
|  One of the primary reasons we introduced c-ares support was to allow the name
 | |
|  resolve phase to be perfectly non-blocking as well.
 | |
| 
 | |
|  The ultimate goal is to provide the easy interface simply by wrapping the
 | |
|  multi interface functions and thus treat everything internally as the multi
 | |
|  interface is the single interface we have.
 | |
| 
 | |
|  The FTP and the SFTP/SCP protocols are thus perfect examples of how we adapt
 | |
|  and adjust the code to allow non-blocking operations even on multi-stage
 | |
|  protocols. They are built around state machines that return when they could
 | |
|  block waiting for data.  The DICT, LDAP and TELNET protocols are crappy
 | |
|  examples and they are subject for rewrite in the future to better fit the
 | |
|  libcurl protocol family.
 | |
| 
 | |
| SSL libraries
 | |
| =============
 | |
| 
 | |
|  Originally libcurl supported SSLeay for SSL/TLS transports, but that was then
 | |
|  extended to its successor OpenSSL but has since also been extended to several
 | |
|  other SSL/TLS libraries and we expect and hope to further extend the support
 | |
|  in future libcurl versions.
 | |
| 
 | |
|  To deal with this internally in the best way possible, we have a generic SSL
 | |
|  function API as provided by the sslgen.[ch] system, and they are the only SSL
 | |
|  functions we must use from within libcurl. sslgen is then crafted to use the
 | |
|  appropriate lower-level function calls to whatever SSL library that is in
 | |
|  use.
 | |
| 
 | |
| Library Symbols
 | |
| ===============
 | |
| 
 | |
|  All symbols used internally in libcurl must use a 'Curl_' prefix if they're
 | |
|  used in more than a single file. Single-file symbols must be made static.
 | |
|  Public ("exported") symbols must use a 'curl_' prefix. (There are exceptions,
 | |
|  but they are to be changed to follow this pattern in future versions.) Public
 | |
|  API functions are marked with CURL_EXTERN in the public header files so that
 | |
|  all others can be hidden on platforms where this is possible.
 | |
| 
 | |
| Return Codes and Informationals
 | |
| ===============================
 | |
| 
 | |
|  I've made things simple. Almost every function in libcurl returns a CURLcode,
 | |
|  that must be CURLE_OK if everything is OK or otherwise a suitable error code
 | |
|  as the curl/curl.h include file defines. The very spot that detects an error
 | |
|  must use the Curl_failf() function to set the human-readable error
 | |
|  description.
 | |
| 
 | |
|  In aiding the user to understand what's happening and to debug curl usage, we
 | |
|  must supply a fair amount of informational messages by using the Curl_infof()
 | |
|  function. Those messages are only displayed when the user explicitly asks for
 | |
|  them. They are best used when revealing information that isn't otherwise
 | |
|  obvious.
 | |
| 
 | |
| API/ABI
 | |
| =======
 | |
| 
 | |
|  We make an effort to not export or show internals or how internals work, as
 | |
|  that makes it easier to keep a solid API/ABI over time. See docs/libcurl/ABI
 | |
|  for our promise to users.
 | |
| 
 | |
| Client
 | |
| ======
 | |
| 
 | |
|  main() resides in src/main.c together with most of the client code.
 | |
| 
 | |
|  src/hugehelp.c is automatically generated by the mkhelp.pl perl script to
 | |
|  display the complete "manual" and the src/urlglob.c file holds the functions
 | |
|  used for the URL-"globbing" support. Globbing in the sense that the {} and []
 | |
|  expansion stuff is there.
 | |
| 
 | |
|  The client mostly messes around to setup its 'config' struct properly, then
 | |
|  it calls the curl_easy_*() functions of the library and when it gets back
 | |
|  control after the curl_easy_perform() it cleans up the library, checks status
 | |
|  and exits.
 | |
| 
 | |
|  When the operation is done, the ourWriteOut() function in src/writeout.c may
 | |
|  be called to report about the operation. That function is using the
 | |
|  curl_easy_getinfo() function to extract useful information from the curl
 | |
|  session.
 | |
| 
 | |
|  Recent versions may loop and do all this several times if many URLs were
 | |
|  specified on the command line or config file.
 | |
| 
 | |
| Memory Debugging
 | |
| ================
 | |
| 
 | |
|  The file lib/memdebug.c contains debug-versions of a few functions. Functions
 | |
|  such as malloc, free, fopen, fclose, etc that somehow deal with resources
 | |
|  that might give us problems if we "leak" them. The functions in the memdebug
 | |
|  system do nothing fancy, they do their normal function and then log
 | |
|  information about what they just did. The logged data can then be analyzed
 | |
|  after a complete session,
 | |
| 
 | |
|  memanalyze.pl is the perl script present in tests/ that analyzes a log file
 | |
|  generated by the memory tracking system. It detects if resources are
 | |
|  allocated but never freed and other kinds of errors related to resource
 | |
|  management.
 | |
| 
 | |
|  Internally, definition of preprocessor symbol DEBUGBUILD restricts code which
 | |
|  is only compiled for debug enabled builds. And symbol CURLDEBUG is used to
 | |
|  differentiate code which is _only_ used for memory tracking/debugging.
 | |
| 
 | |
|  Use -DCURLDEBUG when compiling to enable memory debugging, this is also
 | |
|  switched on by running configure with --enable-curldebug. Use -DDEBUGBUILD
 | |
|  when compiling to enable a debug build or run configure with --enable-debug.
 | |
| 
 | |
|  curl --version will list 'Debug' feature for debug enabled builds, and
 | |
|  will list 'TrackMemory' feature for curl debug memory tracking capable
 | |
|  builds. These features are independent and can be controlled when running
 | |
|  the configure script. When --enable-debug is given both features will be
 | |
|  enabled, unless some restriction prevents memory tracking from being used.
 | |
| 
 | |
| Test Suite
 | |
| ==========
 | |
| 
 | |
|  Since November 2000, a test suite has evolved. It is placed in its own
 | |
|  subdirectory directly off the root in the curl archive tree, and it contains
 | |
|  a bunch of scripts and a lot of test case data.
 | |
| 
 | |
|  The main test script is runtests.pl that will invoke test servers like
 | |
|  httpserver.pl and ftpserver.pl before all the test cases are performed. The
 | |
|  test suite currently only runs on unix-like platforms.
 | |
| 
 | |
|  You'll find a description of the test suite in the tests/README file, and the
 | |
|  test case data files in the tests/FILEFORMAT file.
 | |
| 
 | |
|  The test suite automatically detects if curl was built with the memory
 | |
|  debugging enabled, and if it was it will detect memory leaks, too.
 | |
| 
 | |
| Building Releases
 | |
| =================
 | |
| 
 | |
|  There's no magic to this. When you consider everything stable enough to be
 | |
|  released, run the 'maketgz' script (using 'make distcheck' will give you a
 | |
|  pretty good view on the status of the current sources). maketgz prompts for
 | |
|  version number of the client and the library before it creates a release
 | |
|  archive. maketgz uses 'make dist' for the actual archive building, why you
 | |
|  need to fill in the Makefile.am files properly for which files that should
 | |
|  be included in the release archives.
 | |
| 
 | |
|  NOTE: you need to have curl checked out from git to be able to do a proper
 | |
|  release build. The release tarballs do not have everything setup in order to
 | |
|  do releases properly.
 | 
