196 lines
		
	
	
		
			7.6 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			196 lines
		
	
	
		
			7.6 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
                                  _   _ ____  _     
 | 
						|
                              ___| | | |  _ \| |    
 | 
						|
                             / __| | | | |_) | |    
 | 
						|
                            | (__| |_| |  _ <| |___ 
 | 
						|
                             \___|\___/|_| \_\_____|
 | 
						|
 | 
						|
INTERNALS
 | 
						|
 | 
						|
 The project is kind of split in two. The library and the client. The client
 | 
						|
 part uses the library, but the library is meant to be designed to allow other
 | 
						|
 applications to use it.
 | 
						|
 | 
						|
 Thus, the largest amount of code and complexity is in the library part.
 | 
						|
 | 
						|
SYMBOLS
 | 
						|
=======
 | 
						|
 All symbols used internally must use a 'Curl_' prefix if they're used in
 | 
						|
 more than a single file. Single file symbols must be made static. Public
 | 
						|
 symbols must use a 'curl_' prefix. (There are exceptions, but they are
 | 
						|
 destined to change to this pattern in the future.)
 | 
						|
 | 
						|
CVS
 | 
						|
===
 | 
						|
 | 
						|
 All changes to the sources are committed to the CVS repository as soon as
 | 
						|
 they're somewhat verified to work. Changes shall be commited as independently
 | 
						|
 as possible so that individual changes can be easier spotted and tracked
 | 
						|
 afterwards.
 | 
						|
 | 
						|
 Tagging shall be used extensively, and by the time we release new archives we
 | 
						|
 should tag the sources with a name similar to the released version number.
 | 
						|
 | 
						|
Windows vs Unix
 | 
						|
===============
 | 
						|
 | 
						|
 There are a few differences in how to program curl the unix way compared to
 | 
						|
 the Windows way. The four most notable details are:
 | 
						|
 | 
						|
 1. Different function names for close(), read(), write()
 | 
						|
 2. Windows requires a couple of init calls for the socket stuff
 | 
						|
 3. The file descriptors for network communication and file operations are
 | 
						|
    not easily interchangable as in unix
 | 
						|
 4. When writing data to stdout, Windows makes end-of-lines the DOS way, thus
 | 
						|
    destroying binary data, although you do want that conversion if it is
 | 
						|
    text coming through... (sigh)
 | 
						|
 | 
						|
 In curl, (1) is made with defines and macros, so that the source looks the
 | 
						|
 same at all places except for the header file that defines them.
 | 
						|
 | 
						|
 (2) must be made by the application that uses libcurl, in curl that means
 | 
						|
 src/main.c has some code #ifdef'ed to do just that.
 | 
						|
 | 
						|
 (3) is simply avoided by not trying any funny tricks on file descriptors.
 | 
						|
 | 
						|
 (4) we set stdout to binary under windows
 | 
						|
 | 
						|
 Inside the source code, I do make an effort to avoid '#ifdef WIN32'. All
 | 
						|
 conditionals that deal with features *should* instead be in the format
 | 
						|
 '#ifdef HAVE_THAT_WEIRD_FUNCTION'. Since Windows can't run configure scripts,
 | 
						|
 I maintain two config-win32.h files (one in / and one in src/) that are
 | 
						|
 supposed to look exactly as a config.h file would have looked like on a
 | 
						|
 Windows machine!
 | 
						|
 | 
						|
Library
 | 
						|
=======
 | 
						|
 | 
						|
 As described elsewhere, libcurl is meant to get two different "layers" of
 | 
						|
 interfaces. At the present point only the high-level, the "easy", interface
 | 
						|
 has been fully implemented and documented. We assume the easy-interface in
 | 
						|
 this description, the low-level interface will be documented when fully
 | 
						|
 implemented.
 | 
						|
 | 
						|
 There are plenty of entry points to the library, namely each publicly defined
 | 
						|
 function that libcurl offers to applications. All of those functions are
 | 
						|
 rather small and easy-to-follow. All the ones prefixed with 'curl_easy' are
 | 
						|
 put in the lib/easy.c file.
 | 
						|
 | 
						|
 curl_easy_init() allocates an internal struct and makes some initializations.
 | 
						|
 The returned handle does not revail internals.
 | 
						|
 | 
						|
 curl_easy_setopt() takes a three arguments, where the option stuff must be
 | 
						|
 passed in pairs, the parameter-ID and the parameter-value. The list of
 | 
						|
 options is documented in the man page.
 | 
						|
 | 
						|
 curl_easy_perform() does a whole lot of things:
 | 
						|
 | 
						|
 The function analyzes the URL, get the different components and connects to
 | 
						|
 the remote host. This may involve using a proxy and/or using SSL. The
 | 
						|
 GetHost() function in lib/hostip.c is used for looking up host names.
 | 
						|
 | 
						|
 When connected, the proper function is called. The functions are named after
 | 
						|
 the protocols they handle. ftp(), http(), dict(), etc. They all reside in
 | 
						|
 their respective files (ftp.c, http.c and dict.c).
 | 
						|
 | 
						|
 The protocol-specific functions deal with protocol-specific negotiations and
 | 
						|
 setup. They have access to the sendf() (from lib/sendf.c) function to send
 | 
						|
 printf-style formatted data to the remote host and when they're ready to make
 | 
						|
 the actual file transfer they call the Transfer() function (in
 | 
						|
 lib/download.c) to do the transfer. All printf()-style functions use the
 | 
						|
 supplied clones in lib/mprintf.c.
 | 
						|
 | 
						|
 While transfering, the progress functions in lib/progress.c are called at a
 | 
						|
 frequent interval (or at the user's choice, a specified callback might get
 | 
						|
 called). The speedcheck functions in lib/speedcheck.c are also used to verify
 | 
						|
 that the transfer is as fast as required.
 | 
						|
 | 
						|
 When completed curl_easy_cleanup() should be called to free up used
 | 
						|
 resources.
 | 
						|
 | 
						|
 HTTP(S)
 | 
						|
 | 
						|
 HTTP offers a lot and is the protocol in curl that uses the most lines of
 | 
						|
 code. There is a special file (lib/formdata.c) that offers all the multipart
 | 
						|
 post functions.
 | 
						|
 | 
						|
 base64-functions for user+password stuff is in (lib/base64.c) and all
 | 
						|
 functions for parsing and sending cookies are found in
 | 
						|
 (lib/cookie.c).
 | 
						|
 | 
						|
 HTTPS uses in almost every means the same procedure as HTTP, with only two
 | 
						|
 exceptions: the connect procedure is different and the function used to read
 | 
						|
 or write from the socket is different, although the latter fact is hidden in
 | 
						|
 the source by the use of curl_read() for reading and curl_write() for writing
 | 
						|
 data to the remote server.
 | 
						|
 | 
						|
 FTP
 | 
						|
 | 
						|
 The if2ip() function can be used for getting the IP number of a specified
 | 
						|
 network interface, and it resides in lib/if2ip.c. It is only used for the FTP
 | 
						|
 PORT command.
 | 
						|
 | 
						|
 TELNET
 | 
						|
 | 
						|
 Telnet is implemented in lib/telnet.c.
 | 
						|
 | 
						|
 FILE
 | 
						|
 | 
						|
 The file:// protocol is dealt with in lib/file.c.
 | 
						|
 | 
						|
 LDAP
 | 
						|
 | 
						|
 Everything LDAP is in lib/ldap.c.
 | 
						|
 | 
						|
 GENERAL
 | 
						|
 | 
						|
 URL encoding and decoding, called escaping and unescaping in the source code,
 | 
						|
 is found in lib/escape.c.
 | 
						|
 | 
						|
 While transfering data in Transfer() a few functions might get
 | 
						|
 used. curl_getdate() in lib/getdate.c is for HTTP date comparisons (and
 | 
						|
 more).
 | 
						|
 | 
						|
 lib/getenv.c offers curl_getenv() which is for reading environment variables
 | 
						|
 in a neat platform independent way. That's used in the client, but also in
 | 
						|
 lib/url.c when checking the proxy environment variables.
 | 
						|
 | 
						|
 lib/netrc.c holds the .netrc parser
 | 
						|
 | 
						|
 lib/timeval.c features replacement functions for systems that don't have
 | 
						|
 gettimeofday().
 | 
						|
 
 | 
						|
 A function named curl_version() that returns the full curl version string is
 | 
						|
 found in lib/version.c.
 | 
						|
 | 
						|
Client
 | 
						|
======
 | 
						|
 | 
						|
 main() resides in src/main.c together with most of the client code.
 | 
						|
 src/hugehelp.c is automatically generated by the mkhelp.pl perl script to
 | 
						|
 display the complete "manual" and the src/urlglob.c file holds the functions
 | 
						|
 used for the multiple-URL support.
 | 
						|
 | 
						|
 The client mostly mess around to setup its config struct properly, then it
 | 
						|
 calls the curl_easy_*() functions of the library and when it gets back
 | 
						|
 control after the curl_easy_perform() it cleans up the library, checks status
 | 
						|
 and exits.
 | 
						|
 | 
						|
 When the operation is done, the ourWriteOut() function in src/writeout.c may
 | 
						|
 be called to report about the operation. That function is using the
 | 
						|
 curl_easy_getinfo() function to extract useful information from the curl
 | 
						|
 session.
 | 
						|
 | 
						|
Test Suite
 | 
						|
==========
 | 
						|
 | 
						|
 During November 2000, a test suite has evolved. It is placed in its own
 | 
						|
 subdirectory directly off the root in the curl archive tree, and it contains
 | 
						|
 a bunch of scripts and a lot of test case data.
 | 
						|
 | 
						|
 The main test script is runtests.pl that will invoke the two servers
 | 
						|
 httpserver.pl and ftpserver.pl before all the test cases are performed. The
 | 
						|
 test suite currently only runs on unix-like platforms.
 | 
						|
 | 
						|
 You'll find a complete description of the test case data files in the README
 | 
						|
 file in the test directory.
 |