360 lines
		
	
	
		
			15 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			360 lines
		
	
	
		
			15 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
                                       Updated for curl 7.7.2 on April 26, 2001
 | 
						|
                                  _   _ ____  _     
 | 
						|
                              ___| | | |  _ \| |    
 | 
						|
                             / __| | | | |_) | |    
 | 
						|
                            | (__| |_| |  _ <| |___ 
 | 
						|
                             \___|\___/|_| \_\_____|
 | 
						|
 | 
						|
INTERNALS
 | 
						|
 | 
						|
 The project is split in two. The library and the client. The client part uses
 | 
						|
 the library, but the library is designed to allow other applications to use
 | 
						|
 it.
 | 
						|
 | 
						|
 The largest amount of code and complexity is in the library part.
 | 
						|
 | 
						|
CVS
 | 
						|
===
 | 
						|
 All changes to the sources are committed to the CVS repository as soon as
 | 
						|
 they're somewhat verified to work. Changes shall be commited as independently
 | 
						|
 as possible so that individual changes can be easier spotted and tracked
 | 
						|
 afterwards.
 | 
						|
 | 
						|
 Tagging shall be used extensively, and by the time we release new archives we
 | 
						|
 should tag the sources with a name similar to the released version number.
 | 
						|
 | 
						|
Windows vs Unix
 | 
						|
===============
 | 
						|
 | 
						|
 There are a few differences in how to program curl the unix way compared to
 | 
						|
 the Windows way. The four perhaps most notable details are:
 | 
						|
 | 
						|
 1. Different function names for socket operations.
 | 
						|
 | 
						|
   In curl, this is solved with defines and macros, so that the source looks
 | 
						|
   the same at all places except for the header file that defines them. The
 | 
						|
   macros in use are sclose(), sread() and swrite().
 | 
						|
 | 
						|
 2. Windows requires a couple of init calls for the socket stuff.
 | 
						|
 | 
						|
   Those must be made by the application that uses libcurl, in curl that means
 | 
						|
   src/main.c has some code #ifdef'ed to do just that.
 | 
						|
 | 
						|
 3. The file descriptors for network communication and file operations are
 | 
						|
    not easily interchangable as in unix.
 | 
						|
 | 
						|
   We avoid this by not trying any funny tricks on file descriptors.
 | 
						|
 | 
						|
 4. When writing data to stdout, Windows makes end-of-lines the DOS way, thus
 | 
						|
    destroying binary data, although you do want that conversion if it is
 | 
						|
    text coming through... (sigh)
 | 
						|
 | 
						|
   We set stdout to binary under windows
 | 
						|
 | 
						|
 Inside the source code, We make an effort to avoid '#ifdef [Your OS]'. All
 | 
						|
 conditionals that deal with features *should* instead be in the format
 | 
						|
 '#ifdef HAVE_THAT_WEIRD_FUNCTION'. Since Windows can't run configure scripts,
 | 
						|
 we maintain two config-win32.h files (one in / and one in src/) that are
 | 
						|
 supposed to look exactly as a config.h file would have looked like on a
 | 
						|
 Windows machine!
 | 
						|
 | 
						|
 Generally speaking: always remember that this will be compiled on dozens of
 | 
						|
 operating systems. Don't walk on the edge.
 | 
						|
 | 
						|
Library
 | 
						|
=======
 | 
						|
 | 
						|
 There are plenty of entry points to the library, namely each publicly defined
 | 
						|
 function that libcurl offers to applications. All of those functions are
 | 
						|
 rather small and easy-to-follow. All the ones prefixed with 'curl_easy' are
 | 
						|
 put in the lib/easy.c file.
 | 
						|
 | 
						|
 All printf()-style functions use the supplied clones in lib/mprintf.c. This
 | 
						|
 makes sure we stay absolutely platform independent.
 | 
						|
 | 
						|
 curl_easy_init() allocates an internal struct and makes some initializations.
 | 
						|
 The returned handle does not reveal internals.
 | 
						|
 | 
						|
 curl_easy_setopt() takes a three arguments, where the option stuff must be
 | 
						|
 passed in pairs, the parameter-ID and the parameter-value. The list of
 | 
						|
 options is documented in the man page.
 | 
						|
 | 
						|
 curl_easy_perform() does a whole lot of things:
 | 
						|
 | 
						|
 It starts off in the lib/easy.c file by calling Curl_perform() and the main
 | 
						|
 work then continues lib/url.c. The flow continues with a call to
 | 
						|
 Curl_connect() to connect to the remote site.
 | 
						|
 | 
						|
 o Curl_connect()
 | 
						|
 | 
						|
   ... analyzes the URL, it separates the different components and connects to
 | 
						|
   the remote host. This may involve using a proxy and/or using SSL. The
 | 
						|
   Curl_gethost() function in lib/hostip.c is used for looking up host names.
 | 
						|
 | 
						|
   When Curl_connect is done, we are connected to the remote site. Then it is
 | 
						|
   time to tell the server to get a document/file. Curl_do() arranges this.
 | 
						|
 | 
						|
 o Curl_do()
 | 
						|
 | 
						|
   Curl_do() makes sure the proper protocol-specific function is called. The
 | 
						|
   functions are named after the protocols they handle. Curl_ftp(),
 | 
						|
   Curl_http(), Curl_dict(), etc. They all reside in their respective files
 | 
						|
   (ftp.c, http.c and dict.c).
 | 
						|
 | 
						|
   The protocol-specific functions of course deal with protocol-specific
 | 
						|
   negotiations and setup. They have access to the Curl_sendf() (from
 | 
						|
   lib/sendf.c) function to send printf-style formatted data to the remote
 | 
						|
   host and when they're ready to make the actual file transfer they call the
 | 
						|
   Curl_Transfer() function (in lib/transfer.c) to setup the transfer and
 | 
						|
   returns.
 | 
						|
 | 
						|
 o Transfer()
 | 
						|
 | 
						|
   Curl_perform() then calls Transfer() in lib/transfer.c that performs
 | 
						|
   the entire file transfer.
 | 
						|
 | 
						|
   During transfer, the progress functions in lib/progress.c are called at a
 | 
						|
   frequent interval (or at the user's choice, a specified callback might get
 | 
						|
   called). The speedcheck functions in lib/speedcheck.c are also used to
 | 
						|
   verify that the transfer is as fast as required.
 | 
						|
 | 
						|
 o Curl_done()
 | 
						|
 | 
						|
   Called after a transfer is done. This function takes care of everything
 | 
						|
   that has to be done after a transfer. This function attempts to leave
 | 
						|
   matters in a state so that Curl_do() should be possible to call again on
 | 
						|
   the same connection (in a persistent connection case). It may also soon be
 | 
						|
   closed with Curl_disconnect().
 | 
						|
 | 
						|
 o Curl_disconnect()
 | 
						|
 | 
						|
   During normal connection and transfers, no one ever tries to close any
 | 
						|
   connection so this is not normally called when curl_easy_perform() is
 | 
						|
   used. This function is only used when we are certain that no more transfers
 | 
						|
   is going to be made on the connection (it can be also closed by
 | 
						|
   force). This function can also be called at times to make sure that libcurl
 | 
						|
   doesn't keep too many connections alive at the same time.
 | 
						|
 | 
						|
   This function cleans up all resources that are associated with a single
 | 
						|
   connection.
 | 
						|
 | 
						|
 Curl_perform() is the function that does the main "connect - do - transfer -
 | 
						|
 done" loop. It loops if there's a Location: to follow.
 | 
						|
 | 
						|
 When completed, the curl_easy_cleanup() should be called to free up used
 | 
						|
 resources. It runs Curl_disconnect() on all open connectons.
 | 
						|
 | 
						|
 A quick roundup on internal function sequences (many of these call
 | 
						|
 protocol-specific function-pointers):
 | 
						|
 | 
						|
  curl_connect - connects to a remote site and does initial connect fluff
 | 
						|
   This also checks for an existing connection to the requested site and uses
 | 
						|
   that one if it is possible.
 | 
						|
 | 
						|
   curl_do - starts a transfer
 | 
						|
    curl_transfer() - transfers data
 | 
						|
   curl_done - ends a transfer
 | 
						|
 | 
						|
  curl_disconnect - disconnects from a remote site. This is called when the
 | 
						|
   disconnect is really requested, which doesn't necessarily have to be
 | 
						|
   exactly after curl_done in case we want to keep the connection open for
 | 
						|
   a while.
 | 
						|
 | 
						|
 HTTP(S)
 | 
						|
 | 
						|
 HTTP offers a lot and is the protocol in curl that uses the most lines of
 | 
						|
 code. There is a special file (lib/formdata.c) that offers all the multipart
 | 
						|
 post functions.
 | 
						|
 | 
						|
 base64-functions for user+password stuff (and more) is in (lib/base64.c) and
 | 
						|
 all functions for parsing and sending cookies are found in (lib/cookie.c).
 | 
						|
 | 
						|
 HTTPS uses in almost every means the same procedure as HTTP, with only two
 | 
						|
 exceptions: the connect procedure is different and the function used to read
 | 
						|
 or write from the socket is different, although the latter fact is hidden in
 | 
						|
 the source by the use of curl_read() for reading and curl_write() for writing
 | 
						|
 data to the remote server.
 | 
						|
 | 
						|
 http_chunks.c contains functions that understands HTTP 1.1 chunked transfer
 | 
						|
 encoding.
 | 
						|
 | 
						|
 An interesting detail with the HTTP(S) request, is the add_buffer() series of
 | 
						|
 functions we use. They append data to one single buffer, and when the
 | 
						|
 building is done the entire request is sent off in one single write. This is
 | 
						|
 done this way to overcome problems with flawed firewalls and lame servers.
 | 
						|
 | 
						|
 FTP
 | 
						|
 | 
						|
 The Curl_if2ip() function can be used for getting the IP number of a
 | 
						|
 specified network interface, and it resides in lib/if2ip.c.
 | 
						|
 | 
						|
 Curl_ftpsendf() is used for sending FTP commands to the remote server. It was
 | 
						|
 made a separate function to prevent us programmers from forgetting that they
 | 
						|
 must be CRLF terminated. They must also be sent in one single write() to make
 | 
						|
 firewalls and similar happy.
 | 
						|
 | 
						|
 Kerberos
 | 
						|
 | 
						|
 The kerberos support is mainly in lib/krb4.c and lib/security.c.
 | 
						|
 | 
						|
 TELNET
 | 
						|
 | 
						|
 Telnet is implemented in lib/telnet.c.
 | 
						|
 | 
						|
 FILE
 | 
						|
 | 
						|
 The file:// protocol is dealt with in lib/file.c.
 | 
						|
 | 
						|
 LDAP
 | 
						|
 | 
						|
 Everything LDAP is in lib/ldap.c.
 | 
						|
 | 
						|
 GENERAL
 | 
						|
 | 
						|
 URL encoding and decoding, called escaping and unescaping in the source code,
 | 
						|
 is found in lib/escape.c.
 | 
						|
 | 
						|
 While transfering data in Transfer() a few functions might get
 | 
						|
 used. curl_getdate() in lib/getdate.c is for HTTP date comparisons (and
 | 
						|
 more).
 | 
						|
 | 
						|
 lib/getenv.c offers curl_getenv() which is for reading environment variables
 | 
						|
 in a neat platform independent way. That's used in the client, but also in
 | 
						|
 lib/url.c when checking the proxy environment variables. Note that contrary
 | 
						|
 to the normal unix getenv(), this returns an allocated buffer that must be
 | 
						|
 free()ed after use.
 | 
						|
 | 
						|
 lib/netrc.c holds the .netrc parser
 | 
						|
 | 
						|
 lib/timeval.c features replacement functions for systems that don't have
 | 
						|
 gettimeofday() and a few support functions for timeval convertions.
 | 
						|
 
 | 
						|
 A function named curl_version() that returns the full curl version string is
 | 
						|
 found in lib/version.c.
 | 
						|
 | 
						|
 If authentication is requested but no password is given, a getpass_r() clone
 | 
						|
 exists in lib/getpass.c. libcurl offers a custom callback that can be used
 | 
						|
 instead of this, but it doesn't change much to us.
 | 
						|
 | 
						|
Persistent Connections
 | 
						|
======================
 | 
						|
 | 
						|
 With curl 7.7, we added persistent connection support to libcurl which has
 | 
						|
 introduced a somewhat different treatmeant of things inside of libcurl.
 | 
						|
 | 
						|
 o The 'UrlData' struct returned in the curl_easy_init() call must never
 | 
						|
   hold connection-oriented data. It is meant to hold the root data as well
 | 
						|
   as all the options etc that the library-user may choose.
 | 
						|
 o The 'UrlData' struct holds the cache array of pointers to 'connectdata'
 | 
						|
   structs. There's one connectdata struct for each connection that libcurl
 | 
						|
   knows about.
 | 
						|
 o This also enables the 'curl handle' to be reused on subsequent transfers,
 | 
						|
   something that was illegal in pre-7.7 versions.
 | 
						|
 o When we are about to perform a transfer with curl_easy_perform(), we first
 | 
						|
   check for an already existing connection in the cache that we can use,
 | 
						|
   otherwise we create a new one and add to the cache. If the cache is full
 | 
						|
   already when we add a new connection, we close one of the present ones. We
 | 
						|
   select which one to close dependent on the close policy that may have been
 | 
						|
   previously set.
 | 
						|
 o When the tranfer operation is complete, we try to leave the connection open.
 | 
						|
   Particular options may tell us not to, and protocols may signal closure on
 | 
						|
   connections and then we don't keep it open of course.
 | 
						|
 o When curl_easy_cleanup() is called, we close all still opened connections.
 | 
						|
 | 
						|
 You do realize that the curl handle must be re-used in order for the
 | 
						|
 persistent connections to work.
 | 
						|
 | 
						|
Library Symbols
 | 
						|
===============
 | 
						|
 
 | 
						|
 All symbols used internally in libcurl must use a 'Curl_' prefix if they're
 | 
						|
 used in more than a single file. Single-file symbols must be made
 | 
						|
 static. Public (exported) symbols must use a 'curl_' prefix. (There are
 | 
						|
 exceptions, but they are destined to be changed to follow this pattern in the
 | 
						|
 future.)
 | 
						|
 | 
						|
Return Codes and Informationals
 | 
						|
===============================
 | 
						|
 | 
						|
 I've made things simple. Almost every function in libcurl returns a CURLcode,
 | 
						|
 that must be CURLE_OK if everything is OK or otherwise a suitable error code
 | 
						|
 as the curl/curl.h include file defines. The very spot that detects an error
 | 
						|
 must use the Curl_failf() function to set the human-readable error
 | 
						|
 description.
 | 
						|
 | 
						|
 In aiding the user to understand what's happening and to debug curl usage, we
 | 
						|
 must supply a fair amount of informational messages by using the Curl_infof()
 | 
						|
 function. Those messages are only displayed when the user explicitly asks for
 | 
						|
 them. They are best used when revealing information that isn't otherwise
 | 
						|
 obvious.
 | 
						|
 | 
						|
Client
 | 
						|
======
 | 
						|
 | 
						|
 main() resides in src/main.c together with most of the client code.
 | 
						|
 | 
						|
 src/hugehelp.c is automatically generated by the mkhelp.pl perl script to
 | 
						|
 display the complete "manual" and the src/urlglob.c file holds the functions
 | 
						|
 used for the URL-"globbing" support. Globbing in the sense that the {} and []
 | 
						|
 expansion stuff is there.
 | 
						|
 | 
						|
 The client mostly messes around to setup its 'config' struct properly, then
 | 
						|
 it calls the curl_easy_*() functions of the library and when it gets back
 | 
						|
 control after the curl_easy_perform() it cleans up the library, checks status
 | 
						|
 and exits.
 | 
						|
 | 
						|
 When the operation is done, the ourWriteOut() function in src/writeout.c may
 | 
						|
 be called to report about the operation. That function is using the
 | 
						|
 curl_easy_getinfo() function to extract useful information from the curl
 | 
						|
 session.
 | 
						|
 | 
						|
 Recent versions may loop and do all this several times if many URLs were
 | 
						|
 specified on the command line or config file.
 | 
						|
 | 
						|
Memory Debugging
 | 
						|
================
 | 
						|
 | 
						|
 The file lib/memdebug.c contains debug-versions of a few functions. Functions
 | 
						|
 such as malloc, free, fopen, fclose, etc that somehow deal with resources
 | 
						|
 that might give us problems if we "leak" them. The functions in the memdebug
 | 
						|
 system do nothing fancy, they do their normal function and then log
 | 
						|
 information about what they just did. The logged data can then be analyzed
 | 
						|
 after a complete session,
 | 
						|
 | 
						|
 memanalyze.pl is the perl script present only present in CVS (not part of the
 | 
						|
 release archives) that analyzes a log file generated by the memdebug
 | 
						|
 system. It detects if resources are allocated but never freed and other kinds
 | 
						|
 of errors related to resource management.
 | 
						|
 | 
						|
 Use -DMALLOCDEBUG when compiling to enable memory debugging, this is also
 | 
						|
 switched on by running configure with --enable-debug.
 | 
						|
 | 
						|
Test Suite
 | 
						|
==========
 | 
						|
 | 
						|
 Since November 2000, a test suite has evolved. It is placed in its own
 | 
						|
 subdirectory directly off the root in the curl archive tree, and it contains
 | 
						|
 a bunch of scripts and a lot of test case data.
 | 
						|
 | 
						|
 The main test script is runtests.pl that will invoke the two servers
 | 
						|
 httpserver.pl and ftpserver.pl before all the test cases are performed. The
 | 
						|
 test suite currently only runs on unix-like platforms.
 | 
						|
 | 
						|
 You'll find a complete description of the test case data files in the
 | 
						|
 tests/README file.
 | 
						|
 | 
						|
 The test suite automatically detects if curl was built with the memory
 | 
						|
 debugging enabled, and if it was it will detect memory leaks too.
 | 
						|
 | 
						|
Building Releases
 | 
						|
=================
 | 
						|
 | 
						|
 There's no magic to this. When you consider everything stable enough to be
 | 
						|
 released, run the 'maketgz' script (using 'make distcheck' will give you a
 | 
						|
 pretty good view on the status of the current sources). maketgz prompts for
 | 
						|
 version number of the client and the library before it creates a release
 | 
						|
 archive. maketgz uses 'make dist' for the actual archive building, why you
 | 
						|
 need to fill in the Makefile.am files properly for which files that should
 | 
						|
 be included in the release archives.
 | 
						|
 |