\input texinfo @c -*-texinfo-*- @c %**start of header @setfilename festival.info @settitle Festival Speech Synthesis System @finalout @setchapternewpage odd @c %**end of header @c This document was modelled on the numerous examples of texinfo @c documentation available with GNU software, primarily the hello @c world example, but many others too. I happily acknowledge their @c aid in producing this document -- awb @set EDITION 1.4 @set VERSION 1.4.3 @set UPDATED 27th December 2002 @ifinfo This file documents the @code{Festival} Speech Synthesis System a general text to speech system for making your computer talk and developing new synthesis techniques. Copyright (C) 1996-2004 University of Edinburgh Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. @ignore Permission is granted to process this file through TeX, or otherwise and print the results, provided the printed document carries copying permission notice identical to this one except for the removal of this paragraph (this paragraph not being relevant to the printed manual). @end ignore Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the authors. @end ifinfo @titlepage @title The Festival Speech Synthesis System @subtitle System documentation @subtitle Edition @value{EDITION}, for Festival Version @value{VERSION} @subtitle @value{UPDATED} @author by Alan W Black, Paul Taylor and Richard Caley. @page @vskip 0pt plus 1filll Copyright @copyright{} 1996-2004 University of Edinburgh, all rights reserved. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the University of Edinburgh @end titlepage @node Top, , , (dir) @ifinfo This file documents the @emph{Festival Speech Synthesis System} @value{VERSION}. This document contains many gaps and is still in the process of being written. @end ifinfo @menu * Abstract:: initial comments * Copying:: How you can copy and share the code * Acknowledgements:: List of contributors * What is new:: Enhancements since last public release * Overview:: Generalities and Philosophy * Installation:: Compilation and Installation * Quick start:: Just tell me what to type * Scheme:: A quick introduction to Festival's scripting language Text methods for interfacing to Festival * TTS:: Text to speech modes * XML/SGML mark-up:: XML/SGML mark-up Language * Emacs interface:: Using Festival within Emacs Internal functions * Phonesets:: Defining and using phonesets * Lexicons:: Building and compiling Lexicons * Utterances:: Existing and defining new utterance types Modules * Text analysis:: Tokenizing text * POS tagging:: Part of speech tagging * Phrase breaks:: Finding phrase breaks * Intonation:: Intonations modules * Duration:: Duration modules * UniSyn synthesizer:: The UniSyn waveform synthesizer * Diphone synthesizer:: Building and using diphone synthesizers * Other synthesis methods:: other waveform synthesis methods * Audio output:: Getting sound from Festival * Voices:: Adding new voices (and languages) * Tools:: CART, Ngrams etc * Building models from databases:: Adding new modules and writing C++ code * Programming:: Programming in Festival (Lisp/C/C++) * API:: Using Festival in other programs * Examples:: Some simple (and not so simple) examples * Problems:: Reporting bugs. * References:: Other sources of information * Feature functions:: List of builtin feature functions. * Variable list:: Short descriptions of all variables * Function list:: Short descriptions of all functions * Index:: Index of concepts. @end menu @node Abstract, Copying, , Top @chapter Abstract This document provides a user manual for the Festival Speech Synthesis System, version @value{VERSION}. Festival offers a general framework for building speech synthesis systems as well as including examples of various modules. As a whole it offers full text to speech through a number APIs: from shell level, though a Scheme command interpreter, as a C++ library, and an Emacs interface. Festival is multi-lingual, we have develeoped voices in many languages including English (UK and US), Spanish and Welsh, though English is the most advanced. The system is written in C++ and uses the Edinburgh Speech Tools for low level architecture and has a Scheme (SIOD) based command interpreter for control. Documentation is given in the FSF texinfo format which can generate a printed manual, info files and HTML. The latest details and a full software distribution of the Festival Speech Synthesis System are available through its home page which may be found at @example @url{http://www.cstr.ed.ac.uk/projects/festival.html} @end example @node Copying, Acknowledgements, Abstract, Top @chapter Copying @cindex restrictions @cindex redistribution As we feeel the core system has reached an acceptable level of maturity from 1.4.0 the basic system is released under a free lience, without the commercial restrictions we imposed on early versions. The basic system has been placed under an X11 type licence which as free licences go is pretty free. No GPL code is included in festival or the speech tools themselves (though some auxiliary files are GPL'd e.g. the Emacs mode for Festival). We have deliberately choosen a licence that should be compatible with our commercial partners and our free software users. However although the code is free, we still offer no warranties and no maintenance. We will continue to endeavor to fix bugs and answer queries when can, but are not in a position to guarantee it. We will consider maintenance contracts and consultancy if desired, please contacts us for details. Also note that not all the voices and lexicons we distribute with festival are free. Particularly the British English lexicon derived from Oxford Advanced Learners' Dictionary is free only for non-commercial use (we will release an alternative soon). Also the Spanish diphone voice we relase is only free for non-commercial use. If you are using Festival or the speech tools in commercial environment, even though no licence is required, we would be grateful if you let us know as it helps justify ourselves to our various sponsors. The current copyright on the core system is @example The Festival Speech Synthesis System: version 1.4.3 Centre for Speech Technology Research University of Edinburgh, UK Copyright (c) 1996-2004 All Rights Reserved. Permission is hereby granted, free of charge, to use and distribute this software and its documentation without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of this work, and to permit persons to whom this work is furnished to do so, subject to the following conditions: 1. The code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Any modifications must be clearly marked as such. 3. Original authors' names are not deleted. 4. The authors' names are not used to endorse or promote products derived from this software without specific prior written permission. THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. @end example @node Acknowledgements, What is new, Copying, Top @chapter Acknowledgements @cindex acknowledgements @cindex thanks The code in this system was primarily written by Alan W Black, Paul Taylor and Richard Caley. Festival sits on top of the Edinburgh Speech Tools Library, and uses much of its functionality. Amy Isard wrote a synthesizer for her MSc project in 1995, which first used the Edinburgh Speech Tools Library. Although Festival doesn't contain any code from that system, her system was used as a basic model. Much of the design and philosophy of Festival has been built on the experience both Paul and Alan gained from the development of various previous synthesizers and software systems, especially CSTR's Osprey and Polyglot systems @cite{taylor91} and ATR's CHATR system @cite{black94}. However, it should be stated that Festival is fully developed at CSTR and contains neither proprietary code or ideas. Festival contains a number of subsystems integrated from other sources and we acknowledge those systems here. @section SIOD @cindex SIOD @cindex Scheme @cindex Paradigm Associates The Scheme interpreter (SIOD -- Scheme In One Defun 3.0) was written by George Carrett (gjc@@mitech.com, gjc@@paradigm.com) and offers a basic small Scheme (Lisp) interpreter suitable for embedding in applications such as Festival as a scripting language. A number of changes and improvements have been added in our development but it still remains that basic system. We are grateful to George and Paradigm Associates Incorporated for providing such a useful and well-written sub-system. @example Scheme In One Defun (SIOD) COPYRIGHT (c) 1988-1994 BY PARADIGM ASSOCIATES INCORPORATED, CAMBRIDGE, MASSACHUSETTS. ALL RIGHTS RESERVED Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of Paradigm Associates Inc not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. PARADIGM DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL PARADIGM BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. @end example @section editline Because of conflicts between the copyright for GNU readline, for which an optional interface was included in earlier versions, we have replace the interface with a complete command line editing system based on @file{editline}. @file{Editline} was posted to the USENET newsgroup @file{comp.sources.misc} in 1992. A number of modifications have been made to make it more useful to us but the original code (contained within the standard speech tools distribution) and our modifications fall under the following licence. @example Copyright 1992 Simmule Turner and Rich Salz. All rights reserved. This software is not subject to any license of the American Telephone and Telegraph Company or of the Regents of the University of California. Permission is granted to anyone to use this software for any purpose on any computer system, and to alter it and redistribute it freely, subject to the following restrictions: 1. The authors are not responsible for the consequences of use of this software, no matter how awful, even if they arise from flaws in it. 2. The origin of this software must not be misrepresented, either by explicit claim or by omission. Since few users ever read sources, credits must appear in the documentation. 3. Altered versions must be plainly marked as such, and must not be misrepresented as being the original software. Since few users ever read sources, credits must appear in the documentation. 4. This notice may not be removed or altered. @end example @section Edinburgh Speech Tools Library @cindex Edinburgh Speech Tools Library The Edinburgh Speech Tools lies at the core of Festival. Although developed separately, much of the development of certain parts of the Edinburgh Speech Tools has been directed by Festival's needs. In turn those who have contributed to the Speech Tools make Festival a more usable system. @xref{Acknowledgements, , Acknowledgements, speechtools, Edinburgh Speech Tools Library Manual}. Online information about the Edinburgh Speech Tools library is available through @example @url{http://www.cstr.ed.ac.uk/projects/speech_tools.html} @end example @section Others Many others have provided actual code and support for Festival, for which we are grateful. Specifically, @itemize @bullet @item Alistair Conkie: various low level code points and some design work, Spanish synthesis, the old diphone synthesis code. @item Steve Isard: directorship and LPC diphone code, design of diphone schema. @item EPSRC: who fund Alan Black and Paul Taylor. @item Sun Microsystems Laboratories: for supporting the project and funding Richard. @item AT&T Labs - Research: for supporting the project. @item Paradigm Associates and George Carrett: for Scheme in one defun. @item Mike Macon: Improving the quality of the diphone synthesizer and LPC analysis. @item Kurt Dusterhoff: Tilt intonation training and modelling. @item Amy Isard: for her SSML project and related synthesizer. @item Richard Tobin: for answering all those difficult questions, the socket code, and the XML parser. @item Simmule Turner and Rich Salz: command line editor (editline) @item Borja Etxebarria: Help with the Spanish synthesis @item Briony Williams: Welsh synthesis @item Jacques H. de Villiers: @file{jacques@@cse.ogi.edu} from CSLU at OGI, for the TCL interface, and other usability issues @item Kevin Lenzo: @file{lenzo@@cs.cmu.edu} from CMU for the PERL interface. @item Rob Clarke: for support under Linux. @item Samuel Audet @file{guardia@@cam.org}: OS/2 support @item Mari Ostendorf: For providing access to the BU FM Radio corpus from which some modules were trained. @item Melvin Hunt: from whose work we based our residual LPC synthesis model on @item Oxford Text Archive: For the computer users version of Oxford Advanced Learners' Dictionary (redistributed with permission). @item Reading University: for access to MARSEC from which the phrase break model was trained. @item LDC & Penn Tree Bank: from which the POS tagger was trained, redistribution of the models is with permission from the LDC. @item Roger Burroughes and Kurt Dusterhoff: For letting us capture their voices. @item ATR and Nick Campbell: for first getting Paul and Alan to work together and for the experience we gained. @item FSF: for G++, make, .... @item Center for Spoken Language Understanding: CSLU at OGI, particularly Ron Cole and Mike Macon, have acted as significant users for the system giving significant feedback and allowing us to teach courses on Festival offering valuable real-use feedback. @item Our beta testers: Thanks to all the people who put up with previous versions of the system and reported bugs, both big and small. These comments are very important to the constant improvements in the system. And thanks for your quick responses when I had specific requests. @item And our users ... Many people have downloaded earlier versions of the system. Many have found problems with installation and use and have reported it to us. Many of you have put up with multiple compilations trying to fix bugs remotely. We thank you for putting up with us and are pleased you've taken the time to help us improve our system. Many of you have come up with uses we hadn't thought of, which is always rewarding. Even if you haven't actively responded, the fact that you use the system at all makes it worthwhile. @end itemize @node What is new, Overview, Acknowledgements , Top @chapter What is new Compared to the the previous major release (1.3.0 release Aug 1998) 1.4.0 is not functionally so different from its previous versions. This release is primarily a consolidation release fixing and tidying up some of the lower level aspects of the system to allow better modularity for some of our future planned modules. @itemize @bullet @item Copyright change: The system is now free and has no commercial restriction. Note that currently on the US voices (ked and kal) are also now unrestricted. The UK English voices depend on the Oxford Advanced Learners' Dictionary of Current English which cannot be used for commercial use without permission from Oxford University Press. @item Architecture tidy up: the interfaces to lower level part parts of the system have been tidied up deleting some of the older code that was supported for compatibility reasons. This is a much higher dependence of features and easier (and safer) ways to register new objects as feature values and Scheme objects. Scheme has been tidied up. It is no longer "in one defun" but "in one directory". @item New documentation system for speech tools: A new docbook based documentation system has been added to the speech tools. Festival's documentation will will move over to this sometime soon too. @item initial JSAPI support: both JSAPI and JSML (somewhat similar to Sable) now have initial impelementations. They of course depend on Java support which so far we have only (successfully) investgated under Solaris and Linux. @item Generalization of statistical models: CART, ngrams, and WFSTs are now fully supported from Lisp and can be used with a generalized viterbi function. This makes adding quite complex statistical models easy without adding new C++. @item Tilt Intonation modelling: Full support is now included for the Tilt intomation models, both training and use. @item Documentation on Bulding New Voices in Festival: documentation, scripts etc. for building new voices and languages in the system, see @example @url{http://www.cstr.ed.ac.uk/projects/festival/docs/festvox/} @end example @end itemize @node Overview, Installation , What is new, Top @chapter Overview Festival is designed as a speech synthesis system for at least three levels of user. First, those who simply want high quality speech from arbitrary text with the minimum of effort. Second, those who are developing language systems and wish to include synthesis output. In this case, a certain amount of customization is desired, such as different voices, specific phrasing, dialog types etc. The third level is in developing and testing new synthesis methods. This manual is not designed as a tutorial on converting text to speech but for documenting the processes and use of our system. We do not discuss the detailed algorithms involved in converting text to speech or the relative merits of multiple methods, though we will often give references to relevant papers when describing the use of each module. For more general information about text to speech we recommend Dutoit's @file{An introduction to Text-to-Speech Synthesis} @cite{dutoit97}. For more detailed research issues in TTS see @cite{sproat98} or @cite{vansanten96}. @menu * Philosophy:: Why we did it like it is * Future:: How much better its going to get @end menu @node Philosophy, Future, , Overview @section Philosophy One of the biggest problems in the development of speech synthesis, and other areas of speech and language processing systems, is that there are a lot of simple well-known techniques lying around which can help you realise your goal. But in order to improve some part of the whole system it is necessary to have a whole system in which you can test and improve your part. Festival is intended as that whole system in which you may simply work on your small part to improve the whole. Without a system like Festival, before you could even start to test your new module you would need to spend significant effort to build a whole system, or adapt an existing one before you could start working on your improvements. Festival is specifically designed to allow the addition of new modules, easily and efficiently, so that development need not get bogged down in re-implementing the wheel. But there is another aspect of Festival which makes it more useful than simply an environment for researching into new synthesis techniques. It is a fully usable text-to-speech system suitable for embedding in other projects that require speech output. The provision of a fully working easy-to-use speech synthesizer in addition to just a testing environment is good for two specific reasons. First, it offers a conduit for our research, in that our experiments can quickly and directly benefit users of our synthesis system. And secondly, in ensuring we have a fully working usable system we can immediately see what problems exist and where our research should be directed rather where our whims take us. These concepts are not unique to Festival. ATR's CHATR system (@cite{black94}) follows very much the same philosophy and Festival benefits from the experiences gained in the development of that system. Festival benefits from various pieces of previous work. As well as CHATR, CSTR's previous synthesizers, Osprey and the Polyglot projects influenced many design decisions. Also we are influenced by more general programs in considering software engineering issues, especially GNU Octave and Emacs on which the basic script model was based. Unlike in some other speech and language systems, software engineering is considered very important to the development of Festival. Too often research systems consist of random collections of hacky little scripts and code. No one person can confidently describe the algorithms it performs, as parameters are scattered throughout the system, with tricks and hacks making it impossible to really evaluate why the system is good (or bad). Such systems do not help the advancement of speech technology, except perhaps in pointing at ideas that should be further investigated. If the algorithms and techniques cannot be described externally from the program @emph{such that} they can reimplemented by others, what is the point of doing the work? Festival offers a common framework where multiple techniques may be implemented (by the same or different researchers) so that they may be tested more fairly in the same environment. As a final word, we'd like to make two short statements which both achieve the same end but unfortunately perhaps not for the same reasons: @quotation Good software engineering makes good research easier @end quotation But the following seems to be true also @quotation If you spend enough effort on something it can be shown to be better than its competitors. @end quotation @node Future, , Philosophy , Overview @section Future Festival is still very much in development. Hopefully this state will continue for a long time. It is never possible to complete software, there are always new things that can make it better. However as time goes on Festival's core architecture will stabilise and little or no changes will be made. Other aspects of the system will gain greater attention such as waveform synthesis modules, intonation techniques, text type dependent analysers etc. Festival will improve, so don't expected it to be the same six months from now. A number of new modules and enhancements are already under consideration at various stages of implementation. The following is a non-exhaustive list of what we may (or may not) add to Festival over the next six months or so. @itemize @bullet @item Selection-based synthesis: Moving away from diphone technology to more generalized selection of units for speech database. @item New structure for linguistic content of utterances: Using techniques for Metrical Phonology we are building more structure representations of utterances reflecting there linguistic significance better. This will allow improvements in prosody and unit selection. @item Non-prosodic prosodic control: For language generation systems and custom tasks where the speech to be synthesized is being generated by some program, more information about text structure will probably exist, such as phrasing, contrast, key items etc. We are investigating the relationship of high-level tags to prosodic information through the Sole project @url{http://www.cstr.ed.ac.uk/projects/sole.html} @item Dialect independent lexicons: Currently for each new dialect we need a new lexicon, we are currently investigating a form of lexical specification that is dialect independent that allows the core form to be mapped to different dialects. This will make the generation of voices in different dialects much easier. @end itemize @node Installation, Quick start, Overview, Top @chapter Installation This section describes how to install Festival from source in a new location and customize that installation. @menu * Requirements:: Software/Hardware requirements for Festival * Configuration:: Setting up compilation * Site initialization:: Settings for your particular site * Checking an installation:: But does it work ... @end menu @node Requirements, Configuration, , Installation @section Requirements @cindex requirements In order to compile Festival you first need the following source packages @table @code @item festival-1.4.3-release.tar.gz Festival Speech Synthesis System source @item speech_tools-1.2.3-release.tar.gz The Edinburgh Speech Tools Library @item festlex_NAME.tar.gz @cindex lexicon The lexicon distribution, where possible, includes the lexicon input file as well as the compiled form, for your convenience. The lexicons have varying distribution policies, but are all free except OALD, which is only free for non-commercial use (we are working on a free replacement). In some cases only a pointer to an ftp'able file plus a program to convert that file to the Festival format is included. @item festvox_NAME.tar.gz You'll need a speech database. A number are available (with varying distribution policies). Each voice may have other dependencies such as requiring particular lexicons @item festdoc_1.4.3.tar.gz Full postscript, info and html documentation for Festival and the Speech Tools. The source of the documentation is available in the standard distributions but for your conveniences it has been pre-generated. @end table In addition to Festival specific sources you will also need @table @emph @item A UNIX machine Currently we have compiled and tested the system under Solaris (2.5(.1), 2.6, 2.7 and 2.8), SunOS (4.1.3), FreeBSD (3.x, 4.x), Linux (Redhat 4.1, 5.0, 5.1, 5.2, 6.[012], 7.[01], 8.0 and other Linux distributions), and it should work under OSF (Dec Alphas), SGI (Irix), HPs (HPUX). But any standard UNIX machine should be acceptable. We have now successfully ported this version to Windows NT and Windows 95 (using the Cygnus GNU win32 environment). This is still a young port but seems to work. @item A C++ compiler @cindex GNU g++ @cindex g++ @cindex C++ Note that C++ is not very portable even between different versions of the compiler from the same vendor. Although we've tried very hard to make the system portable, we know it is very unlikely to compile without change except with compilers that have already been tested. The currently tested systems are @itemize @bullet @item Sun Sparc Solaris 2.5, 2.5.1, 2.6, 2.7, 2.9: GCC 2.95.1, GCC 3.2 @item FreeBSD for Intel 3.x and 4.x: GCC 2.95.1, GCC 3.0 @item Linux for Intel (RedHat 4.1/5.0/5.1/5.2/6.0/7.x/8.0): GCC 2.7.2, GCC 2.7.2/egcs-1.0.2, egcs 1.1.1, egcs-1.1.2, GCC 2.95.[123], GCC "2.96", GCC 3.0, GCC 3.0.1 GCC 3.2 GCC 3.2.1 @item Windows NT 4.0: GCC 2.7.2 plus egcs (from Cygnus GNU win32 b19), Visual C++ PRO v5.0, Visual C++ v6.0 @end itemize Note if GCC works on one version of Unix it usually works on others. @cindex Windows NT/95 We have compiled both the speech tools and Festival under Windows NT 4.0 and Windows 95 using the GNU tools available from Cygnus. @example @url{ftp://ftp.cygnus.com/pub/gnu-win32/}. @end example @item GNU make Due to there being too many different @code{make} programs out there we have tested the system using GNU make on all systems we use. Others may work but we know GNU make does. @item Audio hardware @cindex audio hardware You can use Festival without audio output hardware but it doesn't sound very good (though admittedly you can hear less problems with it). A number of audio systems are supported (directly inherited from the audio support in the Edinburgh Speech Tools Library): NCD's NAS (formerly called netaudio) a network transparent audio system (which can be found at @url{ftp://ftp.x.org/contrib/audio/nas/}); @file{/dev/audio} (at 8k ulaw and 8/16bit linear), found on Suns, Linux machines and FreeBSD; and a method allowing arbitrary UNIX commands. @xref{Audio output}. @end table @cindex readline @cindex editline @cindex GNU readline Earlier versions of Festival mistakenly offered a command line editor interface to the GNU package readline, but due to conflicts with the GNU Public Licence and Festival's licence this interface was removed in version 1.3.1. Even Festival's new free licence would cause problems as readline support would restrict Festival linking with non-free code. A new command line interface based on editline was provided that offers similar functionality. Editline remains a compilation option as it is probably not yet as portable as we would like it to be. @cindex @file{texi2html} In addition to the above, in order to process the documentation you will need @file{TeX}, @file{dvips} (or similar), GNU's @file{makeinfo} (part of the texinfo package) and @file{texi2html} which is available from @url{http://wwwcn.cern.ch/dci/texi2html/}. @cindex documentation However the document files are also available pre-processed into, postscript, DVI, info and html as part of the distribution in @file{festdoc-1.4.X.tar.gz}. Ensure you have a fully installed and working version of your C++ compiler. Most of the problems people have had in installing Festival have been due to incomplete or bad compiler installation. It might be worth checking if the following program works if you don't know if anyone has used your C++ installation before. @example #include int main (int argc, char **argv) @{ cout << "Hello world\n"; @} @end example Unpack all the source files in a new directory. The directory will then contain two subdirectories @example speech_tools/ festival/ @end example @node Configuration, Site initialization, Requirements , Installation @section Configuration First ensure you have a compiled version of the Edinburgh Speech Tools Library. See @file{speech_tools/INSTALL} for instructions. @cindex configuration The system now supports the standard GNU @file{configure} method for set up. In most cases this will automatically configure festival for your particular system. In most cases you need only type @example gmake @end example and the system will configure itself and compile, (note you need to have compiled the Edinburgh Speech Tools @file{speech_tools-1.2.2} first. @cindex @file{config/config} In some case hand configuration is required. All of the configuration choices are kept in the file @file{config/config}. @cindex OTHER_DIRS For the most part Festival configuration inherits the configuration from your speech tools config file (@file{../speech_tools/config/config}). Additional optional modules may be added by adding them to the end of your config file e.g. @example ALSO_INCLUDE += clunits @end example Adding and new module here will treat is as a new directory in the @file{src/modules/} and compile it into the system in the same way the @code{OTHER_DIRS} feature was used in previous versions. @cindex NFS @cindex automounter If the compilation directory being accessed by NFS or if you use an automounter (e.g. amd) it is recommend to explicitly set the variable @code{FESTIVAL_HOME} in @file{config/config}. The command @code{pwd} is not reliable when a directory may have multiple names. There is a simple test suite with Festival but it requires the three basic voices and their respective lexicons installed before it will work. Thus you need to install @example festlex_CMU.tar.gz festlex_OALD.tar.gz festlex_POSLEX.tar.gz festvox_don.tar.gz festvox_kedlpc16k.tar.gz festvox_rablpc16k.tar.gz @end example If these are installed you can test the installation with @example gmake test @end example To simply make it run with a male US English voice it is sufficient to install just @example festlex_CMU.tar.gz festlex_POSLEX.tar.gz festvox_kallpc16k.tar.gz @end example Note that the single most common reason for problems in compilation and linking found amongst the beta testers was a bad installation of GNU C++. If you get many strange errors in G++ library header files or link errors it is worth checking that your system has the compiler, header files and runtime libraries properly installed. This may be checked by compiling a simple program under C++ and also finding out if anyone at your site has ever used the installation. Most of these installation problems are caused by upgrading to a newer version of libg++ without removing the older version so a mixed version of the @file{.h} files exist. Although we have tried very hard to ensure that Festival compiles with no warnings this is not possible under some systems. @cindex SunOS Under SunOS the system include files do not declare a number of system provided functions. This a bug in Sun's include files. This will causes warnings like "implicit definition of fprintf". These are harmless. @cindex Linux Under Linux a warning at link time about reducing the size of some symbols often is produced. This is harmless. There is often occasional warnings about some socket system function having an incorrect argument type, this is also harmless. @cindex Visual C++ The speech tools and festival compile under Windows95 or Windows NT with Visual C++ v5.0 using the Microsoft @file{nmake} make program. We've only done this with the Professonal edition, but have no reason to believe that it relies on anything not in the standard edition. In accordance to VC++ conventions, object files are created with extension .obj, executables with extension .exe and libraries with extension .lib. This may mean that both unix and Win32 versions can be built in the same directory tree, but I wouldn't rely on it. To do this you require nmake Makefiles for the system. These can be generated from the gnumake Makefiles, using the command @example gnumake VCMakefile @end example in the speech_tools and festival directories. I have only done this under unix, it's possible it would work under the cygnus gnuwin32 system. If @file{make.depend} files exist (i.e. if you have done @file{gnumake depend} in unix) equivalent @file{vc_make.depend} files will be created, if not the VCMakefiles will not contain dependency information for the @file{.cc} files. The result will be that you can compile the system once, but changes will not cause the correct things to be rebuilt. In order to compile from the DOS command line using Visual C++ you need to have a collection of environment variables set. In Windows NT there is an instalation option for Visual C++ which sets these globally. Under Windows95 or if you don't ask for them to be set globally under NT you need to run @example vcvars32.bat @end example See the VC++ documentation for more details. Once you have the source trees with VCMakefiles somewhere visible from Windows, you need to copy @file{peech_tools\config\vc_config-dist} to @file{speech_tools\config\vc_config} and edit it to suit your local situation. Then do the same with @file{festival\config\vc_config-dist}. The thing most likely to need changing is the definition of @code{FESTIVAL_HOME} in @file{festival\config\vc_config_make_rules} which needs to point to where you have put festival. Now you can compile. cd to the speech_tools directory and do @example nmake /nologo /fVCMakefile @end example @exdent and the library, the programs in main and the test programs should be compiled. The tests can't be run automatically under Windows. A simple test to check that things are probably OK is: @example main\na_play testsuite\data\ch_wave.wav @end example @exdent which reads and plays a waveform. Next go into the festival directory and do @example nmake /nologo /fVCMakefile @end example @exdent to build festival. When it's finished, and assuming you have the voices and lexicons unpacked in the right place, festival should run just as under unix. We should remind you that the NT/95 ports are still young and there may yet be problems that we've not found yet. We only recommend the use the speech tools and Festival under Windows if you have significant experience in C++ under those platforms. @cindex smaller system @cindex minimal system Most of the modules @file{src/modules} are actually optional and the system could be compiled without them. The basic set could be reduced further if certain facilities are not desired. Particularly: @file{donovan} which is only required if the donovan voice is used; @file{rxp} if no XML parsing is required (e.g. Sable); and @file{parser} if no stochastic paring is required (this parser isn't used for any of our currently released voices). Actually even @file{UniSyn} and @file{UniSyn_diphone} could be removed if some external waveform synthesizer is being used (e.g. MBROLA) or some alternative one like @file{OGIresLPC}. Removing unused modules will make the festival binary smaller and (potentially) start up faster but don't expect too much. You can delete these by changing the @code{BASE_DIRS} variable in @file{src/modules/Makefile}. @node Site initialization, Checking an installation, Configuration, Installation @section Site initialization @cindex run-time configuration @cindex initialization @cindex installation initialization @cindex @file{init.scm} @cindex @file{siteinit.scm} Once compiled Festival may be further customized for particular sites. At start up time Festival loads the file @file{init.scm} from its library directory. This file further loads other necessary files such as phoneset descriptions, duration parameters, intonation parameters, definitions of voices etc. It will also load the files @file{sitevars.scm} and @file{siteinit.scm} if they exist. @file{sitevars.scm} is loaded after the basic Scheme library functions are loaded but before any of the festival related functions are loaded. This file is intended to set various path names before various subsystems are loaded. Typically variables such as @code{lexdir} (the directory where the lexicons are held), and @code{voices_dir} (pointing to voice directories) should be reset here if necessary. @cindex change libdir at run-time @cindex run-time configuration @cindex @code{load-path} The default installation will try to find its lexicons and voices automatically based on the value of @code{load-path} (this is derived from @code{FESTIVAL_HOME} at compilation time or by using the @code{--libdir} at run-time). If the voices and lexicons have been unpacked into subdirectories of the library directory (the default) then no site specific initialization of the above pathnames will be necessary. The second site specific file is @file{siteinit.scm}. Typical examples of local initialization are as follows. The default audio output method is NCD's NAS system if that is supported as that's what we use normally in CSTR. If it is not supported, any hardware specific mode is the default (e.g. sun16audio, freebas16audio, linux16audio or mplayeraudio). But that default is just a setting in @file{init.scm}. If for example in your environment you may wish the default audio output method to be 8k mulaw through @file{/dev/audio} you should add the following line to your @file{siteinit.scm} file @lisp (Parameter.set 'Audio_Method 'sunaudio) @end lisp Note the use of @code{Parameter.set} rather than @code{Parameter.def} the second function will not reset the value if it is already set. Remember that you may use the audio methods @code{sun16audio}. @code{linux16audio} or @code{freebsd16audio} only if @code{NATIVE_AUDIO} was selected in @file{speech_tools/config/config} and your are on such machines. The Festival variable @code{*modules*} contains a list of all supported functions/modules in a particular installation including audio support. Check the value of that variable if things aren't what you expect. If you are installing on a machine whose audio is not directly supported by the speech tools library, an external command may be executed to play a waveform. The following example is for an imaginary machine that can play audio files through a program called @file{adplay} with arguments for sample rate and file type. When playing waveforms, Festival, by default, outputs as unheadered waveform in native byte order. In this example you would set up the default audio playing mechanism in @file{siteinit.scm} as follows @lisp (Parameter.set 'Audio_Method 'Audio_Command) (Parameter.set 'Audio_Command "adplay -raw -r $SR $FILE") @end lisp @cindex output sample rate @cindex output file type @cindex audio command output @cindex audio output rate @cindex audio output filetype For @code{Audio_Command} method of playing waveforms Festival supports two additional audio parameters. @code{Audio_Required_Rate} allows you to use Festivals internal sample rate conversion function to any desired rate. Note this may not be as good as playing the waveform at the sample rate it is originally created in, but as some hardware devices are restrictive in what sample rates they support, or have naive resample functions this could be optimal. The second addition audio parameter is @code{Audio_Required_Format} which can be used to specify the desired output forms of the file. The default is unheadered raw, but this may be any of the values supported by the speech tools (including nist, esps, snd, riff, aiff, audlab, raw and, if you really want it, ascii). For example suppose you run Festival on a remote machine and are not running any network audio system and want Festival to copy files back to your local machine and simply cat them to @file{/dev/audio}. The following would do that (assuming permissions for rsh are allowed). @lisp (Parameter.set 'Audio_Method 'Audio_Command) ;; Make output file ulaw 8k (format ulaw implies 8k) (Parameter.set 'Audio_Required_Format 'ulaw) (Parameter.set 'Audio_Command "userhost=`echo $DISPLAY | sed 's/:.*$//'`; rcp $FILE $userhost:$FILE; \ rsh $userhost \"cat $FILE >/dev/audio\" ; rsh $userhost \"rm $FILE\"") @end lisp Note there are limits on how complex a command you want to put in the @code{Audio_Command} string directly. It can get very confusing with respect to quoting. It is therefore recommended that once you get past a certain complexity consider writing a simple shell script and calling it from the @code{Audio_Command} string. @cindex default voice A second typical customization is setting the default speaker. Speakers depend on many things but due to various licence (and resource) restrictions you may only have some diphone/nphone databases available in your installation. The function name that is the value of @code{voice_default} is called immediately after @file{siteinit.scm} is loaded offering the opportunity for you to change it. In the standard distribution no change should be required. If you download all the distributed voices @code{voice_rab_diphone} is the default voice. You may change this for a site by adding the following to @file{siteinit.scm} or per person by changing your @file{.festivalrc}. For example if you wish to change the default voice to the American one @code{voice_ked_diphone} @lisp (set! voice_default 'voice_ked_diphone) @end lisp Note the single quote, and note that unlike in early versions @code{voice_default} is not a function you can call directly. @cindex @file{.festivalrc} @cindex user initialization A second level of customization is on a per user basis. After loading @file{init.scm}, which includes @file{sitevars.scm} and @file{siteinit.scm} for local installation, Festival loads the file @file{.festivalrc} from the user's home directory (if it exists). This file may contain arbitrary Festival commands. @node Checking an installation, , Site initialization, Installation @section Checking an installation Once compiled and site initialization is set up you should test to see if Festival can speak or not. Start the system @example $ bin/festival Festival Speech Synthesis System 1.4.3:release Jan 2003 Copyright (C) University of Edinburgh, 1996-2003. All rights reserved. For details type `(festival_warranty)' festival> ^D @end example If errors occur at this stage they are most likely to do with pathname problems. If any error messages are printed about non-existent files check that those pathnames point to where you intended them to be. Most of the (default) pathnames are dependent on the basic library path. Ensure that is correct. To find out what it has been set to, start the system without loading the init files. @example $ bin/festival -q Festival Speech Synthesis System 1.4.3:release Jan 2003 Copyright (C) University of Edinburgh, 1996-2003. All rights reserved. For details type `(festival_warranty)' festival> libdir "/projects/festival/lib/" festival> ^D @end example This should show the pathname you set in your @file{config/config}. If the system starts with no errors try to synthesize something @example festival> (SayText "hello world") @end example Some files are only accessed at synthesis time so this may show up other problem pathnames. If it talks, you're in business, if it doesn't, here are some possible problems. @cindex audio problems If you get the error message @example Can't access NAS server @end example You have selected NAS as the audio output but have no server running on that machine or your @code{DISPLAY} or @code{AUDIOSERVER} environment variable is not set properly for your output device. Either set these properly or change the audio output device in @file{lib/siteinit.scm} as described above. Ensure your audio device actually works the way you think it does. On Suns, the audio output device can be switched into a number of different output modes, speaker, jack, headphones. If this is set to the wrong one you may not hear the output. Use one of Sun's tools to change this (try @file{/usr/demo/SOUND/bin/soundtool}). Try to find an audio file independent of Festival and get it to play on your audio. Once you have done that ensure that the audio output method set in Festival matches that. Once you have got it talking, test the audio spooling device. @example festival> (intro) @end example This plays a short introduction of two sentences, spooling the audio output. Finally exit from Festival (by end of file or @code{(quit)}) and test the script mode with. @example $ examples/saytime @end example A test suite is included with Festival but it makes certain assumptions about which voices are installed. It assumes that @code{voice_rab_diphone} (@file{festvox_rabxxxx.tar.gz}) is the default voice and that @code{voice_ked_diphone} and @code{voice_don_diphone} (@file{festvox_kedxxxx.tar.gz} and @file{festvox_don.tar.gz}) are installed. Also local settings in your @file{festival/lib/siteinit.scm} may affect these tests. However, after installation it may be worth trying @example gnumake test @end example from the @file{festival/} directory. This will do various tests including basic utterance tests and tokenization tests. It also checks that voices are installed and that they don't interfere with each other. These tests are primarily regression tests for the developers of Festival, to ensure new enhancements don't mess up existing supported features. They are not designed to test an installation is successful, though if they run correctly it is most probable the installation has worked. @node Quick start, Scheme, Installation, Top @chapter Quick start This section is for those who just want to know the absolute basics to run the system. @cindex command mode @cindex text-to-speech mode @cindex tts mode Festival works in two fundamental modes, @emph{command mode} and @emph{text-to-speech mode} (tts-mode). In command mode, information (in files or through standard input) is treated as commands and is interpreted by a Scheme interpreter. In tts-mode, information (in files or through standard input) is treated as text to be rendered as speech. The default mode is command mode, though this may change in later versions. @menu * Basic command line options:: * Simple command driven session:: * Getting some help:: @end menu @node Basic command line options, Simple command driven session, , Quick start @section Basic command line options @cindex command line options Festival's basic calling method is as @lisp festival [options] file1 file2 ... @end lisp Options may be any of the following @table @code @item -q start Festival without loading @file{init.scm} or user's @file{.festivalrc} @item -b @itemx --batch @cindex batch mode After processing any file arguments do not become interactive @item -i @itemx --interactive @cindex interactive mode After processing file arguments become interactive. This option overrides any batch argument. @item --tts @cindex tts mode Treat file arguments in text-to-speech mode, causing them to be rendered as speech rather than interpreted as commands. When selected in interactive mode the command line edit functions are not available @item --command @cindex command mode Treat file arguments in command mode. This is the default. @item --language LANG @cindex language specification Set the default language to @var{LANG}. Currently @var{LANG} may be one of @code{english}, @code{spanish} or @code{welsh} (depending on what voices are actually available in your installation). @item --server After loading any specified files go into server mode. This is a mode where Festival waits for clients on a known port (the value of @code{server_port}, default is 1314). Connected clients may send commands (or text) to the server and expect waveforms back. @xref{Server/client API}. Note server mode may be unsafe and allow unauthorised access to your machine, be sure to read the security recommendations in @ref{Server/client API} @item --script scriptfile @cindex script files @cindex Festival script files Run scriptfile as a Festival script file. This is similar to to @code{--batch} but it encapsulates the command line arguments into the Scheme variables @code{argv} and @code{argc}, so that Festival scripts may process their command line arguments just like any other program. It also does not load the the basic initialisation files as sometimes you may not want to do this. If you wish them, you should copy the loading sequence from an example Festival script like @file{festival/examples/saytext}. @item --heap NUMBER @cindex heap size @cindex Scheme heap size The Scheme heap (basic number of Lisp cells) is of a fixed size and cannot be dynamically increased at run time (this would complicate garbage collection). The default size is 210000 which seems to be more than adequate for most work. In some of our training experiments where very large list structures are required it is necessary to increase this. Note there is a trade off between size of the heap and time it takes to garbage collect so making this unnecessarily big is not a good idea. If you don't understand the above explanation you almost certainly don't need to use the option. @end table In command mode, if the file name starts with a left parenthesis, the name itself is read and evaluated as a Lisp command. This is often convenient when running in batch mode and a simple command is necessary to start the whole thing off after loading in some other specific files. @node Simple command driven session, Getting some help, Basic command line options, Quick start @section Sample command driven session Here is a short session using Festival's command interpreter. Start Festival with no arguments @lisp $ festival Festival Speech Synthesis System 1.4.3:release Dec 2002 Copyright (C) University of Edinburgh, 1996-2002. All rights reserved. For details type `(festival_warranty)' festival> @end lisp Festival uses the a command line editor based on editline for terminal input so command line editing may be done with Emacs commands. Festival also supports history as well as function, variable name, and file name completion via the @key{TAB} key. Typing @code{help} will give you more information, that is @code{help} without any parenthesis. (It is actually a variable name whose value is a string containing help.) @cindex Scheme @cindex read-eval-print loop Festival offers what is called a read-eval-print loop, because it reads an s-expression (atom or list), evaluates it and prints the result. As Festival includes the SIOD Scheme interpreter most standard Scheme commands work @lisp festival> (car '(a d)) a festival> (+ 34 52) 86 @end lisp In addition to standard Scheme commands a number of commands specific to speech synthesis are included. Although, as we will see, there are simpler methods for getting Festival to speak, here are the basic underlying explicit functions used in synthesizing an utterance. @cindex utterance @cindex hello world Utterances can consist of various types @xref{Utterance types}, but the simplest form is plain text. We can create an utterance and save it in a variable @lisp festival> (set! utt1 (Utterance Text "Hello world")) # festival> @end lisp The (hex) number in the return value may be different for your installation. That is the print form for utterances. Their internal structure can be very large so only a token form is printed. @cindex synthesizing an utterance Although this creates an utterance it doesn't do anything else. To get a waveform you must synthesize it. @lisp festival> (utt.synth utt1) # festival> @end lisp @cindex playing an utterance This calls various modules, including tokenizing, duration,. intonation etc. Which modules are called are defined with respect to the type of the utterance, in this case @code{Text}. It is possible to individually call the modules by hand but you just wanted it to talk didn't you. So @lisp festival> (utt.play utt1) # festival> @end lisp @exdent will send the synthesized waveform to your audio device. You should hear "Hello world" from your machine. @cindex @code{SayText} To make this all easier a small function doing these three steps exists. @code{SayText} simply takes a string of text, synthesizes it and sends it to the audio device. @lisp festival> (SayText "Good morning, welcome to Festival") # festival> @end lisp Of course as history and command line editing are supported @key{c-p} or up-arrow will allow you to edit the above to whatever you wish. Festival may also synthesize from files rather than simply text. @lisp festival> (tts "myfile" nil) nil festival> @end lisp @cindex exiting Festival @cindex @code{quit} The end of file character @key{c-d} will exit from Festival and return you to the shell, alternatively the command @code{quit} may be called (don't forget the parentheses). @cindex TTS @cindex text to speech Rather than starting the command interpreter, Festival may synthesize files specified on the command line @lisp unix$ festival --tts myfile unix$ @end lisp @cindex text to wave @cindex offline TTS Sometimes a simple waveform is required from text that is to be kept and played at some later time. The simplest way to do this with festival is by using the @file{text2wave} program. This is a festival script that will take a file (or text from standard input) and produce a single waveform. @cindex text2wave An example use is @example text2wave myfile.txt -o myfile.wav @end example Options exist to specify the waveform file type, for example if Sun audio format is required @example text2wave myfile.txt -otype snd -o myfile.wav @end example Use @file{-h} on @file{text2wave} to see all options. @node Getting some help, , Simple command driven session, Quick start @section Getting some help @cindex help If no audio is generated then you must check to see if audio is properly initialized on your machine. @xref{Audio output}. In the command interpreter @key{m-h} (meta-h) will give you help on the current symbol before the cursor. This will be a short description of the function or variable, how to use it and what its arguments are. A listing of all such help strings appears at the end of this document. @key{m-s} will synthesize and say the same information, but this extra function is really just for show. @cindex @code{manual} The lisp function @code{manual} will send the appropriate command to an already running Netscape browser process. If @code{nil} is given as an argument the browser will be directed to the tables of contents of the manual. If a non-nil value is given it is assumed to be a section title and that section is searched and if found displayed. For example @example festival> (manual "Accessing an utterance") @end example Another related function is @code{manual-sym} which given a symbol will check its documentation string for a cross reference to a manual section and request Netscape to display it. This function is bound to @key{m-m} and will display the appropriate section for the given symbol. Note also that the @key{TAB} key can be used to find out the name of commands available as can the function @code{Help} (remember the parentheses). For more up to date information on Festival regularly check the Festival Home Page at @example @url{http://www.cstr.ed.ac.uk/projects/festival.html} @end example Further help is available by mailing questions to @example festival-help@@cstr.ed.ac.uk @end example Although we cannot guarantee the time required to answer you, we will do our best to offer help. @cindex bug reports Bug reports should be submitted to @example festival-bug@@cstr.ed.ac.uk @end example If there is enough user traffic a general mailing list will be created so all users may share comments and receive announcements. In the mean time watch the Festival Home Page for news. @node Scheme, TTS, Quick start, Top @chapter Scheme @cindex Scheme introduction Many people seem daunted by the fact that Festival uses Scheme as its scripting language and feel they can't use Festival because they don't know Scheme. However most of those same people use Emacs everyday which also has (a much more complex) Lisp system underneath. The number of Scheme commands you actually need to know in Festival is really very small and you can easily just find out as you go along. Also people use the Unix shell often but only know a small fraction of actual commands available in the shell (or in fact that there even is a distinction between shell builtin commands and user definable ones). So take it easy, you'll learn the commands you need fairly quickly. @menu * Scheme references:: Places to learn more about Scheme * Scheme fundamentals:: Syntax and semantics * Scheme Festival specifics:: * Scheme I/O:: @end menu @node Scheme references, Scheme fundamentals, , Scheme @section Scheme references If you wish to learn about Scheme in more detail I recommend the book @cite{abelson85}. The Emacs Lisp documentation is reasonable as it is comprehensive and many of the underlying uses of Scheme in Festival were influenced by Emacs. Emacs Lisp however is not Scheme so there are some differences. @cindex Scheme references Other Scheme tutorials and resources available on the Web are @itemize @bullet @item The Revised Revised Revised Revised Scheme Report, the document defining the language is available from @example @url{http://tinuviel.cs.wcu.edu/res/ldp/r4rs-html/r4rs_toc.html} @end example @item a Scheme tutorials from the net: @itemize @bullet @item @url{http://www.cs.uoregon.edu/classes/cis425/schemeTutorial.html} @end itemize @item the Scheme FAQ @itemize @bullet @item @url{http://www.landfield.com/faqs/scheme-faq/part1/} @end itemize @end itemize @node Scheme fundamentals, Scheme Festival specifics, Scheme references, Scheme @section Scheme fundamentals But you want more now, don't you, not just be referred to some other book. OK here goes. @emph{Syntax}: an expression is an @emph{atom} or a @emph{list}. A list consists of a left paren, a number of expressions and right paren. Atoms can be symbols, numbers, strings or other special types like functions, hash tables, arrays, etc. @emph{Semantics}: All expressions can be evaluated. Lists are evaluated as function calls. When evaluating a list all the members of the list are evaluated first then the first item (a function) is called with the remaining items in the list as arguments. Atoms are evaluated depending on their type: symbols are evaluated as variables returning their values. Numbers, strings, functions, etc. evaluate to themselves. Comments are started by a semicolon and run until end of line. And that's it. There is nothing more to the language that. But just in case you can't follow the consequences of that, here are some key examples. @lisp festival> (+ 2 3) 5 festival> (set! a 4) 4 festival> (* 3 a) 12 festival> (define (add a b) (+ a b)) # festival> (add 3 4) 7 festival> (set! alist '(apples pears bananas)) (apples pears bananas) festival> (car alist) apples festival> (cdr alist) (pears bananas) festival> (set! blist (cons 'oranges alist)) (oranges apples pears bananas) festival> (append alist blist) (apples pears bananas oranges apples pears bananas) festival> (cons alist blist) ((apples pears bananas) oranges apples pears bananas) festival> (length alist) 3 festival> (length (append alist blist)) 7 @end lisp @node Scheme Festival specifics, Scheme I/O, Scheme fundamentals, Scheme @section Scheme Festival specifics There a number of additions to SIOD that are Festival specific though still part of the Lisp system rather than the synthesis functions per se. By convention if the first statement of a function is a string, it is treated as a documentation string. The string will be printed when help is requested for that function symbol. @cindex debugging Scheme errors @cindex debugging scripts @cindex backtrace In interactive mode if the function @code{:backtrace} is called (within parenthesis) the previous stack trace is displayed. Calling @code{:backtrace} with a numeric argument will display that particular stack frame in full. Note that any command other than @code{:backtrace} will reset the trace. You may optionally call @lisp (set_backtrace t) @end lisp Which will cause a backtrace to be displayed whenever a Scheme error occurs. This can be put in your @file{.festivalrc} if you wish. This is especially useful when running Festival in non-interactive mode (batch or script mode) so that more information is printed when an error occurs. @cindex hooks A @emph{hook} in Lisp terms is a position within some piece of code where a user may specify their own customization. The notion is used heavily in Emacs. In Festival there a number of places where hooks are used. A hook variable contains either a function or list of functions that are to be applied at some point in the processing. For example the @code{after_synth_hooks} are applied after synthesis has been applied to allow specific customization such as resampling or modification of the gain of the synthesized waveform. The Scheme function @code{apply_hooks} takes a hook variable as argument and an object and applies the function/list of functions in turn to the object. @cindex catching errors in Scheme @cindex @code{unwind-protect} @cindex errors in Scheme When an error occurs in either Scheme or within the C++ part of Festival by default the system jumps to the top level, resets itself and continues. Note that errors are usually serious things, pointing to bugs in parameters or code. Every effort has been made to ensure that the processing of text never causes errors in Festival. However when using Festival as a development system it is often that errors occur in code. Sometimes in writing Scheme code you know there is a potential for an error but you wish to ignore that and continue on to the next thing without exiting or stopping and returning to the top level. For example you are processing a number of utterances from a database and some files containing the descriptions have errors in them but you want your processing to continue through every utterance that can be processed rather than stopping 5 minutes after you gone home after setting a big batch job for overnight. @cindex @code{unwind-protect} @cindex catching errors Festival's Scheme provides the function @code{unwind-protect} which allows the catching of errors and then continuing normally. For example suppose you have the function @code{process_utt} which takes a filename and does things which you know might cause an error. You can write the following to ensure you continue processing even in an error occurs. @lisp (unwind-protect (process_utt filename) (begin (format t "Error found in processing %s\n" filename) (format t "continuing\n"))) @end lisp The @code{unwind-protect} function takes two arguments. The first is evaluated and if no error occurs the value returned from that expression is returned. If an error does occur while evaluating the first expression, the second expression is evaluated. @code{unwind-protect} may be used recursively. Note that all files opened while evaluating the first expression are closed if an error occurs. All global variables outside the scope of the @code{unwind-protect} will be left as they were set up until the error. Care should be taken in using this function but its power is necessary to be able to write robust Scheme code. @node Scheme I/O, , Scheme Festival specifics, Scheme @section Scheme I/O @cindex file i/o in Scheme @cindex i/o in Scheme Different Scheme's may have quite different implementations of file i/o functions so in this section we will describe the basic functions in Festival SIOD regarding i/o. Simple printing to the screen may be achieved with the function @code{print} which prints the given s-expression to the screen. The printed form is preceded by a new line. This is often useful for debugging but isn't really powerful enough for much else. @cindex @code{fopen} @cindex @code{fclose} Files may be opened and closed and referred to file descriptors in a direct analogy to C's stdio library. The SIOD functions @code{fopen} and @code{fclose} work in the exactly the same way as their equivalently named partners in C. @cindex @code{format} @cindex formatted output The @code{format} command follows the command of the same name in Emacs and a number of other Lisps. C programmers can think of it as @code{fprintf}. @code{format} takes a file descriptor, format string and arguments to print. The file description may be a file descriptor as returned by the Scheme function @code{fopen}, it may also be @code{t} which means the output will be directed as standard out (cf. @code{printf}). A third possibility is @code{nil} which will cause the output to printed to a string which is returned (cf. @code{sprintf}). The format string closely follows the format strings in ANSI C, but it is not the same. Specifically the directives currently supported are, @code{%%}, @code{%d}, @code{%x}, @code{%s}, @code{%f}, @code{%g} and @code{%c}. All modifiers for these are also supported. In addition @code{%l} is provided for printing of Scheme objects as objects. For example @lisp (format t "%03d %3.4f %s %l %l %l\n" 23 23 "abc" "abc" '(a b d) utt1) @end lisp will produce @lisp 023 23.0000 abc "abc" (a b d) # @end lisp on standard output. @cindex pretty printing When large lisp expressions are printed they are difficult to read because of the parentheses. The function @code{pprintf} prints an expression to a file description (or @code{t} for standard out). It prints so the s-expression is nicely lined up and indented. This is often called pretty printing in Lisps. @cindex reading from files @cindex loading data from files For reading input from terminal or file, there is currently no equivalent to @code{scanf}. Items may only be read as Scheme expressions. The command @lisp (load FILENAME t) @end lisp @exdent will load all s-expressions in @code{FILENAME} and return them, unevaluated as a list. Without the third argument the @code{load} function will load and evaluate each s-expression in the file. To read individual s-expressions use @code{readfp}. For example @lisp (let ((fd (fopen trainfile "r")) (entry) (count 0)) (while (not (equal? (set! entry (readfp fd)) (eof-val))) (if (string-equal (car entry) "home") (set! count (+ 1 count)))) (fclose fd)) @end lisp @cindex @code{parse-number} @cindex @code{atof} @cindex string to number @cindex convert string to number To convert a symbol whose print name is a number to a number use @code{parse-number}. This is the equivalent to @code{atof} in C. Note that, all i/o from Scheme input files is assumed to be basically some form of Scheme data (though can be just numbers, tokens). For more elaborate analysis of incoming data it is possible to use the text tokenization functions which offer a fully programmable method of reading data. @node TTS, XML/SGML mark-up, Scheme, Top @chapter TTS Festival supports text to speech for raw text files. If you are not interested in using Festival in any other way except as black box for rendering text as speech, the following method is probably what you want. @example festival --tts myfile @end example This will say the contents of @file{myfile}. Alternatively text may be submitted on standard input @example echo hello world | festival --tts cat myfile | festival --tts @end example @cindex text modes Festival supports the notion of @emph{text modes} where the text file type may be identified, allowing Festival to process the file in an appropriate way. Currently only two types are considered stable: @code{STML} and @code{raw}, but other types such as @code{email}, @code{HTML}, @code{Latex}, etc. are being developed and discussed below. This follows the idea of buffer modes in Emacs where a file's type can be utilized to best display the text. Text mode may also be selected based on a filename's extension. Within the command interpreter the function @code{tts} is used to render files as text; it takes a filename and the text mode as arguments. @menu * Utterance chunking:: From text to utterances * Text modes:: Mode specific text analysis * Example text mode:: An example mode for reading email @end menu @node Utterance chunking, Text modes, , TTS @section Utterance chunking @cindex utterance chunking @cindex @code{eou_tree} Text to speech works by first tokenizing the file and chunking the tokens into utterances. The definition of utterance breaks is determined by the utterance tree in variable @code{eou_tree}. A default version is given in @file{lib/tts.scm}. This uses a decision tree to determine what signifies an utterance break. Obviously blank lines are probably the most reliable, followed by certain punctuation. The confusion of the use of periods for both sentence breaks and abbreviations requires some more heuristics to best guess their different use. The following tree is currently used which works better than simply using punctuation. @lisp (defvar eou_tree '((n.whitespace matches ".*\n.*\n\\(.\\|\n\\)*") ;; 2 or more newlines ((1)) ((punc in ("?" ":" "!")) ((1)) ((punc is ".") ;; This is to distinguish abbreviations vs periods ;; These are heuristics ((name matches "\\(.*\\..*\\|[A-Z][A-Za-z]?[A-Za-z]?\\|etc\\)") ((n.whitespace is " ") ((0)) ;; if abbrev single space isn't enough for break ((n.name matches "[A-Z].*") ((1)) ((0)))) ((n.whitespace is " ") ;; if it doesn't look like an abbreviation ((n.name matches "[A-Z].*") ;; single space and non-cap is no break ((1)) ((0))) ((1)))) ((0))))) @end lisp The token items this is applied to will always (except in the end of file case) include one following token, so look ahead is possible. The "n." and "p." and "p.p." prefixes allow access to the surrounding token context. The features @code{name}, @code{whitespace} and @code{punc} allow access to the contents of the token itself. At present there is no way to access the lexicon form this tree which unfortunately might be useful if certain abbreviations were identified as such there. Note these are heuristics and written by hand not trained from data, though problems have been fixed as they have been observed in data. The above rules may make mistakes where abbreviations appear at end of lines, and when improper spacing and capitalization is used. This is probably worth changing, for modes where more casual text appears, such as email messages and USENET news messages. A possible improvement could be made by analysing a text to find out its basic threshold of utterance break (i.e. if no full stop, two spaces, followed by a capitalized word sequences appear and the text is of a reasonable length then look for other criteria for utterance breaks). Ultimately what we are trying to do is to chunk the text into utterances that can be synthesized quickly and start to play them quickly to minimise the time someone has to wait for the first sound when starting synthesis. Thus it would be better if this chunking were done on @emph{prosodic phrases} rather than chunks more similar to linguistic sentences. Prosodic phrases are bounded in size, while sentences are not. @node Text modes, Example text mode, Utterance chunking, TTS @section Text modes @cindex text modes We do not believe that all texts are of the same type. Often information about the general contents of file will aid synthesis greatly. For example in Latex files we do not want to here "left brace, backslash e m" before each emphasized word, nor do we want to necessarily hear formating commands. Festival offers a basic method for specifying customization rules depending on the @emph{mode} of the text. By type we are following the notion of modes in Emacs and eventually will allow customization at a similar level. Modes are specified as the third argument to the function @code{tts}. When using the Emacs interface to Festival the buffer mode is automatically passed as the text mode. If the mode is not supported a warning message is printed and the raw text mode is used. Our initial text mode implementation allows configuration both in C++ and in Scheme. Obviously in C++ almost anything can be done but it is not as easy to reconfigure without recompilation. Here we will discuss those modes which can be fully configured at run time. A text mode may contain the following @table @emph @item filter A Unix shell program filter that processes the text file in some appropriate way. For example for email it might remove uninteresting headers and just output the subject, from line and the message body. If not specified, an identity filter is used. @item init_function This (Scheme) function will be called before any processing will be done. It allows further set up of tokenization rules and voices etc. @item exit_function This (Scheme) function will be called at the end of any processing allowing reseting of tokenization rules etc. @item analysis_mode If analysis mode is @code{xml} the file is read through the built in XML parser @code{rxp}. Alternatively if analysis mode is @code{xxml} the filter should an SGML normalising parser and the output is processed in a way suitable for it. Any other value is ignored. @end table These mode specific parameters are specified in the a-list held in @code{tts_text_modes}. When using Festival in Emacs the emacs buffer mode is passed to Festival as the text mode. Note that above mechanism is not really designed to be re-entrant, this should be addressed in later versions. @cindex @code{auto-text-mode-alist} @cindex automatic selection of text mode Following the use of auto-selection of mode in Emacs, Festival can auto-select the text mode based on the filename given when no explicit mode is given. The Lisp variable @code{auto-text-mode-alist} is a list of dotted pairs of regular expression and mode name. For example to specify that the @code{email} mode is to be used for files ending in @file{.email} we would add to the current @code{auto-text-mode-alist} as follows @lisp (set! auto-text-mode-alist (cons (cons "\\.email$" 'email) auto-text-mode-alist)) @end lisp If the function @code{tts} is called with a mode other than @code{nil} that mode overrides any specified by the @code{auto-text-mode-alist}. The mode @code{fundamental} is the explicit "null" mode, it is used when no mode is specified in the function @code{tts}, and match is found in @code{auto-text-mode-alist} or the specified mode is not found. By convention if a requested text model is not found in @code{tts_text_modes} the file @file{MODENAME-mode} will be @code{required}. Therefore if you have the file @file{MODENAME-mode.scm} in your library then it will be automatically loaded on reference. Modes may be quite large and it is not necessary have Festival load them all at start up time. Because of the @code{auto-text-mode-alist} and the auto loading of currently undefined text modes you can use Festival like @example festival --tts example.email @end example Festival with automatically synthesize @file{example.email} in text mode @code{email}. @cindex personal text modes If you add your own personal text modes you should do the following. Suppose you've written an HTML mode. You have named it @file{html-mode.scm} and put it in @file{/home/awb/lib/festival/}. In your @file{.festivalrc} first identify you're personal Festival library directory by adding it to @code{lib-path}. @example (set! lib-path (cons "/home/awb/lib/festival/" lib-path)) @end example Then add the definition to the @code{auto-text-mode-alist} that file names ending @file{.html} or @file{.htm} should be read in HTML mode. @example (set! auto-text-mode-alist (cons (cons "\\.html?$" 'html) auto-text-mode-alist)) @end example Then you may synthesize an HTML file either from Scheme @example (tts "example.html" nil) @end example @exdent Or from the shell command line @example festival --tts example.html @end example Anyone familiar with modes in Emacs should recognise that the process of adding a new text mode to Festival is very similar to adding a new buffer mode to Emacs. @node Example text mode, , Text modes, TTS @section Example text mode @cindex email mode Here is a short example of a tts mode for reading email messages. It is by no means complete but is a start at showing how you can customize tts modes without writing new C++ code. The first task is to define a filter that will take a saved mail message and remove extraneous headers and just leave the from line, subject and body of the message. The filter program is given a file name as its first argument and should output the result on standard out. For our purposes we will do this as a shell script. @example #!/bin/sh # Email filter for Festival tts mode # usage: email_filter mail_message >tidied_mail_message grep "^From: " $1 echo grep "^Subject: " $1 echo # delete up to first blank line (i.e. the header) sed '1,/^$/ d' $1 @end example Next we define the email init function, which will be called when we start this mode. What we will do is save the current token to words function and slot in our own new one. We can then restore the previous one when we exit. @lisp (define (email_init_func) "Called on starting email text mode." (set! email_previous_t2w_func token_to_words) (set! english_token_to_words email_token_to_words) (set! token_to_words email_token_to_words)) @end lisp Note that @emph{both} @code{english_token_to_words} and @code{token_to_words} should be set to ensure that our new token to word function is still used when we change voices. The corresponding end function puts the token to words function back. @lisp (define (email_exit_func) "Called on exit email text mode." (set! english_token_to_words email_previous_t2w_func) (set! token_to_words email_previous_t2w_func)) @end lisp Now we can define the email specific token to words function. In this example we deal with two specific cases. First we deal with the common form of email addresses so that the angle brackets are not pronounced. The second points are to recognise quoted text and immediately change the the speaker to the alternative speaker. @lisp (define (email_token_to_words token name) "Email specific token to word rules." (cond @end lisp This first condition identifies the token as a bracketed email address and removes the brackets and splits the token into name and IP address. Note that we recursively call the function @code{email_previous_t2w_func} on the email name and IP address so that they will be pronounced properly. Note that because that function returns a @emph{list} of words we need to append them together. @lisp ((string-matches name "<.*@.*>") (append (email_previous_t2w_func token (string-after (string-before name "@@") "<")) (cons "at" (email_previous_t2w_func token (string-before (string-after name "@@") ">"))))) @end lisp Our next condition deals with identifying a greater than sign being used as a quote marker. When we detect this we select the alternative speaker, even though it may already be selected. We then return no words so the quote marker is not spoken. The following condition finds greater than signs which are the first token on a line. @lisp ((and (string-matches name ">") (string-matches (item.feat token "whitespace") "[ \t\n]*\n *")) (voice_don_diphone) nil ;; return nothing to say ) @end lisp If it doesn't match any of these we can go ahead and use the builtin token to words function Actually, we call the function that was set before we entered this mode to ensure any other specific rules still remain. But before that we need to check if we've had a newline with doesn't start with a greater than sign. In that case we switch back to the primary speaker. @lisp (t ;; for all other cases (if (string-matches (item.feat token "whitespace") ".*\n[ \t\n]*") (voice_rab_diphone)) (email_previous_t2w_func token name)))) @end lisp @cindex declaring text modes In addition to these we have to actually declare the text mode. This we do by adding to any existing modes as follows. @lisp (set! tts_text_modes (cons (list 'email ;; mode name (list ;; email mode params (list 'init_func email_init_func) (list 'exit_func email_exit_func) '(filter "email_filter"))) tts_text_modes)) @end lisp This will now allow simple email messages to be dealt with in a mode specific way. An example mail message is included in @file{examples/ex1.email}. To hear the result of the above text mode start Festival, load in the email mode descriptions, and call TTS on the example file. @example (tts ".../examples/ex1.email" 'email) @end example The above is very short of a real email mode but does illustrate how one might go about building one. It should be reiterated that text modes are new in Festival and their most effective form has not been discovered yet. This will improve with time and experience. @node XML/SGML mark-up, Emacs interface, TTS, Top @chapter XML/SGML mark-up @cindex STML @cindex SGML @cindex SSML @cindex Sable @cindex XML @cindex Spoken Text Mark-up Language The ideas of a general, synthesizer system nonspecific, mark-up language for labelling text has been under discussion for some time. Festival has supported an SGML based markup language through multiple versions most recently STML (@cite{sproat97}). This is based on the earlier SSML (Speech Synthesis Markup Language) which was supported by previous versions of Festival (@cite{taylor96}). With this version of Festival we support @emph{Sable} a similar mark-up language devised by a consortium from Bell Labls, Sub Microsystems, AT&T and Edinburgh, @cite{sable98}. Unlike the previous versions which were SGML based, the implementation of Sable in Festival is now XML based. To the user they different is negligable but using XML makes processing of files easier and more standardized. Also Festival now includes an XML parser thus reducing the dependencies in processing Sable text. Raw text has the problem that it cannot always easily be rendered as speech in the way the author wishes. Sable offers a well-defined way of marking up text so that the synthesizer may render it appropriately. @cindex CSS @cindex Cascading style sheets @cindex DSSSL The definition of Sable is by no means settled and is still in development. In this release Festival offers people working on Sable and other XML (and SGML) based markup languages a chance to quickly experiment with prototypes by providing a DTD (document type descriptions) and the mapping of the elements in the DTD to Festival functions. Although we have not yet (personally) investigated facilities like cascading style sheets and generalized SGML specification languages like DSSSL we believe the facilities offer by Festival allow rapid prototyping of speech output markup languages. Primarily we see Sable markup text as a language that will be generated by other programs, e.g. text generation systems, dialog managers etc. therefore a standard, easy to parse, format is required, even if it seems overly verbose for human writers. For more information of Sable and access to the mailing list see @example @url{http://www.cstr.ed.ac.uk/projects/sable.html} @end example @menu * Sable example:: an example of Sable with descriptions * Supported Sable tags:: Currently supported Sable tags * Adding Sable tags:: Adding new Sable tags * XML/SGML requirements:: Software environment requirements for use * Using Sable:: Rendering Sable files as speech @end menu @node Sable example, Supported Sable tags, , XML/SGML mark-up @section Sable example Here is a simple example of Sable marked up text @example The boy saw the girl in the park with the telescope. The boy saw the girl in the park with the telescope. Good morning My name is Stuart, which is spelled stuart though some people pronounce it stuart. My telephone number is 2787. I used to work in Buccleuch Place, but no one can pronounce that. By the way, my telephone number is actually @end example @cindex SABLE DTD @cindex @file{Sable.v0_2.dtd} After the initial definition of the SABLE tags, through the file @file{Sable.v0_2.dtd}, which is distributed as part of Festival, the body is given. There are tags for identifying the language and the voice. Explicit boundary markers may be given in text. Also duration and intonation control can be explicit specified as can new pronunciations of words. The last sentence specifies some external filenames to play at that point. @node Supported Sable tags, Adding Sable tags, Sable example, XML/SGML mark-up @section Supported Sable tags @cindex Sable tags There is not yet a definitive set of tags but hopefully such a list will form over the next few months. As adding support for new tags is often trivial the problem lies much more in defining what tags there should be than in actually implementing them. The following are based on version 0.2 of Sable as described in @url{http://www.cstr.ed.ac.uk/projects/sable_spec2.html}, though some aspects are not currently supported in this implementation. Further updates will be announces through the Sable mailing list. @table @code @item LANGUAGE Allows the specification of the language through the @code{ID} attribute. Valid values in Festival are, @code{english}, @code{en1}, @code{spanish}, @code{en}, and others depending on your particular installation. For example @example ... @end example If the language isn't supported by the particualr installation of Festival "Some text in .." is said instead and the section is ommitted. @item SPEAKER Select a voice. Accepts a parameter @code{NAME} which takes values @code{male1}, @code{male2}, @code{female1}, etc. There is currently no definition about what happens when a voice is selected which the synthesizer doesn't support. An example is @example ... @end example @item AUDIO This allows the specification of an external waveform that is to be included. There are attributes for specifying volume and whether the waveform is to be played in the background of the following text or not. Festival as yet only supports insertion. @example My telephone number is