Jump to Content
Françoise Beaufays

Françoise Beaufays

Françoise Beaufays is a Research Scientist at Google, where she leads a team working on on-device machine learning for Speech and Mobile Keyboard models. Her area of scientific expertise covers deep learning, sequence-to-sequence modeling, language modeling and other technologies related to natural language processing, with a recent focus on privacy-preserving modeling techniques. Françoise studied Mechanical and Electrical Engineering in Brussels, Belgium. She holds a PhD in Electrical Engineering and a PhD minor in Italian Literature, both from Stanford University.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    A Method to Reveal Speaker Identity in Distributed ASR Training,and How to Counter It
    Trung Dang
    Peter Chin
    IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022, {IEEE}, pp. 4338-4342
    Preview abstract End-to-end Automatic Speech Recognition (ASR) models are commonly trained over spoken utterances using optimization methods like Stochastic Gradient Descent (SGD). In distributed settings like Federated Learning, model training requires transmission of gradients over a network. In this work, we design the first method for revealing the identity of the speaker of a training utterance with access only to a gradient. We propose Hessian-Free Gradients Matching, an input reconstruction technique that operates without second derivatives of the loss function (required in prior works), which can be expensive to compute. We show the effectiveness of our method using the DeepSpeech model architecture, demonstrating that it is possible to reveal the speaker’s identity with 34% top-1 accuracy (51% top-5 accuracy) on the LibriSpeech dataset. Further, we study the effect of Dropout on the success of our method. We show that a dropout rate of 0.2 can reduce the speaker identity accuracy to 0% top-1 (0.5% top-5). View details
    Preview abstract Recent work has designed methods to demonstrate that model updates in ASR training can leak potentially sensitive attributes of the utterances used in computing the updates. In this work, we design the first method to demonstrate information leakage about training data from trained ASR models. We design Noise Masking, a fill-in-the-blank style method for extracting targeted parts of training data from trained ASR models. We demonstrate the success of Noise Masking by using it in four settings for extracting names from the LibriSpeech dataset used for training a state-of-the-art Conformer model. In particular, we show that we are able to extract the correct names from masked training utterances with 11.8% accuracy, while the model outputs some name from the train set 55.2% of the time. Further, we show that even in a setting that uses synthetic audio and partial transcripts from the test set, our method achieves 2.5% correct name accuracy (47.7% any name success rate). Lastly, we design Word Dropout, a data augmentation method that we show when used in training along with Multistyle TRaining (MTR), provides comparable utility as the baseline, along with significantly mitigating extraction via Noise Masking across the four evaluated settings. View details
    Preview abstract Automatic Speech Recognition models require large amount of speech data for training, and the collection of such data often leads to privacy concerns. Federated learning has been widely used and is considered to be an effective decentralized technique by collaboratively learning a shared prediction model while keeping the data local on different clients devices. However, the limited computation and communication resources on clients devices present practical difficulties for large models. To overcome such challenges, we propose Federated Pruning to train a reduced model under the federated setting, while maintaining similar performance compared to the full model. Moreover, the vast amount of clients data can also be leveraged to improve the pruning results compared to centralized training. We explore different pruning schemes and provide empirical evidence of the effectiveness of our methods. View details
    Preview abstract This paper addresses the challenges of training large neural network models under federated learning settings: high on-device memory usage and communication cost. The proposed Online Model Compression (OMC) provides a framework that stores model parameters in a compressed format and decompresses them only when needed. We use quantization as the compression method in this paper and propose three methods, (1) using per-variable transformation, (2) weight matrices only quantization, and (3) partial parameter quantization, to minimize the impact on model accuracy. According to our experiments on two recent neural networks for speech recognition and two different datasets, OMC can reduce memory usage and communication cost of model parameters by up to 59% while attaining comparable accuracy and training speed when compared with full-precision training. View details
    Preview abstract This paper proposes a framework to improve the typing experience of mobile users in morphologically rich languages. Smartphone keyboards typically support features such as input decoding, corrections and predictions that all rely on language models. For latency reasons, these operations happen on device, so the models are of limited size and cannot easily cover all the words needed by users for their daily tasks, especially in morphologically rich languages. In particular, the compounding nature of Germanic languages makes their vocabulary virtually infinite. Similarly, heavily inflecting and agglutinative languages (e.g. Slavic, Turkic or Finno-Ugric languages) tend to have much larger vocabularies than morphologically simpler languages, such as English or Mandarin. We propose to model such languages with automatically selected subword units annotated with what we call binding types, allowing the decoder to know when to bind subword units into words. We show that this method brings around 20% word error rate reduction in a variety of compounding languages. This is more than twice the improvement we previously obtained with a more basic approach, also described in the paper. View details
    Preview abstract Self- and Semi-supervised learning methods have been actively investigated to reduce labeled training data or enhance the model performance. However, the approach mostly focus on in-domain performance for public datasets. In this study, we utilize the combination of self- and semi-supervised learning methods to solve unseen domain adaptation problem in a large-scale production setting for online ASR model. This approach demonstrates that using the source domain data with a small fraction of the target domain data (3%) can recover the performance gap compared to a full data baseline: relative 13.5% WER improvement for target domain data. View details
    Revealing and Protecting Labels in Distributed Training
    Trung Dang
    Peter Chin
    Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pp. 1727-1738
    Preview abstract Distributed learning paradigms such as federated learning often involve transmission of model updates, or gradients, over a network, thereby avoiding transmission of private data. However, it is possible for sensitive information about the training data to be revealed from such gradients. Prior works have demonstrated that labels can be revealed analytically from the last layer of certain models (e.g., ResNet), or they can be reconstructed jointly with model inputs by using Gradients Matching [Zhu et al.] with additional knowledge about the current state of the model. In this work, we propose a method to discover the set of labels of training samples from only the gradient of the last layer and the id to label mapping. Our method is applicable to a wide variety of model architectures across multiple domains. We demonstrate the effectiveness of our method for model training in two domains - image classification, and automatic speech recognition. Furthermore, we show that existing reconstruction techniques improve their efficacy when used in conjunction with our method. Conversely, we demonstrate that gradient quantization and sparsification can significantly reduce the success of the attack. View details
    Preview abstract This paper presents the first consumer-scale next-word prediction (NWP) model trained with Federated Learning (FL) while leveraging the Differentially Private Federated Averaging (DP-FedAvg) technique. There has been prior work on building practical FL infrastructure, including work demonstrating the feasibility of training language models on mobile devices using such infrastructure. It has also been shown (in simulations on a public corpus) that it is possible to train NWP models with user-level differential privacy using the DP-FedAvg algorithm. Nevertheless, training production-quality NWP models with DP-FedAvg in a real-world production environment on a heterogeneous fleet of mobile phones requires addressing numerous challenges. For instance, the coordinating central server has to keep track of the devices available at the start of each round and sample devices uniformly at random from them, while ensuring \emph{secrecy of the sample}, etc. Unlike all prior privacy-focused FL work of which we are aware, for the first time we demonstrate the deployment of a differentially private mechanism for the training of a production neural network in FL, as well as the instrumentation of the production training infrastructure to perform an end-to-end empirical measurement of unintended memorization. View details
    Understanding Unintended Memorization in Federated Learning
    Third Workshop on Privacy in Natural Language Processing (PrivateNLP 2021) at 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2021) (2020)
    Preview abstract Recent works have shown that generative sequence models (e.g., language models) have a tendency to memorize rare or unique sequences in the training data. Since useful models are often trained on sensitive data, to ensure the privacy of the training data it is critical to identify and mitigate such unintended memorization. Federated Learning (FL) has emerged as a novel framework for large-scale distributed learning tasks. However, it differs in many aspects from the well-studied central learning setting where all the data is stored at the central server. In this paper, we initiate a formal study to understand the effect of different components of canonical FL on unintended memorization in trained models, comparing with the central learning setting. Our results show that several differing components of FL play an important role in reducing unintended memorization. Specifically, we observe that the clustering of data according to users---which happens by design in FL---has a significant effect in reducing such memorization, and using the method of Federated Averaging for training causes a further reduction. We also show that training with a strong user-level differential privacy guarantee results in models that exhibit the least amount of unintended memorization. View details
    Preview abstract We demonstrate that a character-level LSTM neural network is able to learn out-of-vocabulary (OOV) words for the purpose of expanding the vocabulary of a virtual keyboard for smartphones. We train such a model using a distributed, on-device learning framework called federated learning. High-frequency words can then be sampled from the generative model by drawing from the joint posterior directly. We study the feasibility of the approach in three different settings: (1) using stochastic gradient descent, on an anonymized dataset of snippets of user content; (2) using simulated federated learning, on a publicly available non-IID per-user dataset from a popular social networking website; (3) using federated learning, on data hosted on user mobile devices. The model is shown to achieve good recall and precision when compared to ground-truth OOV words in settings (1) and (2). With (3) we demonstrate the practicality of this approach by showing that we can learn meaningful OOV words without exporting sensitive user data to servers. View details
    Preview abstract Federated learning is a distributed, on-device computation framework that enables training global models without exporting sensitive user data to servers. In this work, we describe methods to extend the federation framework to evaluate strategies for personalization of global models. We present tools to analyze the effects of personalization and evaluate conditions under which personalization yields desirable models. We report on our experiments personalizing a language model for a virtual keyboard for smartphones with a population of tens of millions of users. We show that a significant fraction of users benefit from personalization. View details
    Preview abstract We propose algorithms to train production-quality n-gram language models using federated learning. Federated learning is a machine learning technique to train global models to be used on portable devices such as smart phones, without the users' data ever leaving their devices. This is especially relevant for applications handling privacy-sensitive data, such as virtual keyboards. While the principles of federated learning are fairly generic, its methodology assumes that the underlying models are neural networks. However, virtual keyboards are typically powered by n-gram language models, mostly for latency reasons. We propose to train a recurrent neural network language model using the decentralized "FederatedAveraging" algorithm directly on training and to approximating this federated model server-side with an n-gram model that can be deployed to devices for fast inference. Our technical contributions include novel ways of handling large vocabularies, algorithms to correct capitalization errors in user data, and efficient finite state transducer algorithms to convert word language models to word-piece language models and vice versa. The n-gram language models trained with federated learning are compared to n-grams trained with traditional server-based algorithms using A/B tests on tens of millions of users of a virtual keyboard. Results are presented for two languages, American English and Brazilian Portuguese. This work demonstrates that high-quality n-gram language models can be trained directly on client mobile devices without sensitive training data ever leaving the device. View details
    Preview abstract This technical report describes our deep internationalization program for Gboard, the Google Keyboard. Today, Gboard supports 900+ language varieties across 70+ writing systems, and this report describes how and why we added support for these language varieties from around the globe. Many languages of the world are increasingly used in writing on an everyday basis, and we describe the trends we see. We cover technological and logistical challenges in scaling up a language technology product like Gboard to hundreds of language varieties, and describe how we built systems and processes to operate at scale. Finally, we summarize the key take-aways from user studies we ran with speakers of hundreds of languages from around the world. View details
    Preview abstract We train a recurrent neural network language model using a distributed, on-device learning framework called federated learning for the purpose of next-word prediction in a virtual keyboard for smartphones. Server-based training using stochastic gradient descent is compared with training on client devices using the Federated Averaging algorithm. The federated algorithm, which enables training on a higher-quality dataset for this use case, is shown to achieve better prediction recall. This work demonstrates the feasibility and benefit of training language models on client devices without exporting sensitive user data to servers. The federated learning environment gives users greater control over their data and simplifies the task of incorporating privacy by default with distributed training and aggregation across a population of client devices. View details
    Preview abstract We show that a word-level recurrent neural network can predict emoji from text typed on a mobile keyboard. We demonstrate the usefulness of transfer learning for predicting emoji by pretraining the model using a language modeling task. We also propose mechanisms to trigger emoji and tune the diversity of candidates. The model is trained using a distributed on-device learning framework called federated learning. The federated model is shown to achieve better performance than a server-trained model. This work demonstrates the feasibility of using federated learning to train production-quality models for natural language understanding tasks while keeping users' data on their devices. View details
    Preview abstract Speaker-independent speech recognition systems trained with data from many users are generally robust against speaker variability and works well for many unseen speakers. However, it still does not generalize well to users with very different speech characteristics. This issue can be addressed by building a personalized system that works well for each specific user. In this paper, we investigate into securely training personalized end-to-end speech recognition models on mobile devices so that user data and models are kept on mobile devices without communicating with a server. We study how the mobile training environment impacts the performance by simulating on-device data consumption. We conduct experiments using data collected from speech impaired users for personalization. Our results show that personalization achieved 63.7% relative word error rate reduction when trained in a server environment and 58.1% in a mobile environment. Moving to on-device personalization resulted in 18.7% performance degradation, in exchange for improved scalability and data privacy. To train the model on device, we split the gradient computation into two and achieved 45% memory reduction at the expense of 42% increase in training time. View details
    Preview abstract Federated learning is a distributed form of machine learning where both the training data and model training are decentralized. In this paper, we use federated learning in a commercial, global-scale setting to train, evaluate and deploy a model to improve virtual keyboard search suggestion quality without direct access to the underlying user data. We describe our observations in federated training, compare metrics to live deployments, and present resulting quality increases. In whole, we demonstrate how federated learning can be applied end-to-end to both improve user experiences and enhance user privacy. View details
    Preview abstract We train a recurrent neural network language model using a distributed, on-device learning framework called federated learning for the purpose of next-word prediction in a virtual keyboard for smartphones. Server-based training using stochastic gradient descent is compared with training on client devices using the FederatedAveraging algorithm. The federated algorithm, which enables training on a higher-quality dataset for this use case, is shown to achieve better prediction recall. This work demonstrates the feasibility and benefit of training language models on client devices without exporting sensitive user data to servers. The federated learning environment gives users greater control over their data and simplifies the task of incorporating privacy by default with distributed training and aggregation across a population of client devices. View details
    Transliterated mobile keyboard input via weighted finite-state transducers
    Lars Hellsten
    Prasoon Goyal
    Proceedings of the 13th International Conference on Finite State Methods and Natural Language Processing (FSMNLP) (2017)
    Preview abstract We present an extension to a mobile keyboard input decoder based on finite-state transducers that provides general transliteration support, and demonstrate its use for input of South Asian languages using a QWERTY keyboard. On-device keyboard decoders must operate under strict latency and memory constraints, and we present several transducer optimizations that allow for high accuracy decoding under such constraints. Our methods yield substantial accuracy improvements and latency reductions over an existing baseline transliteration keyboard approach. The resulting system was launched for 22 languages in Google Gboard in the first half of 2017. View details
    Preview abstract Automatic speech recognition relies on pronunciation dictionaries for accurate results and previous work used pronunciation learning algorithms to build them. Efficient algorithms must balance having the ability to learn varied pronunciations while being constrained enough to be robust. Our approach extends one of such algorithms \cite{Kou2015} by replacing a finite state transducer (FST) built from a limited-size candidate list with a general and flexible FST building mechanism. This architecture can accommodate a wide variety of pronunciation predictions and can also learn pronunciations without having the written form. It can also use an FST built from a recursive neural network (RNN) and tune the importance given to the written form. The new approach reduces the number of incorrect pronunciations learned by up to 25% (relative) on a random sampling of Google voice traffic View details
    Preview abstract We describe a large vocabulary speech recognition system that is accurate, has low latency, and yet has a small enough memory and computational footprint to run faster than real-time on a Nexus 5 Android smartphone. We employ a quantized Long Short-Term Memory (LSTM) acoustic model trained with connectionist temporal classification (CTC) to directly predict phoneme targets, and further reduce its memory footprint using an SVD-based compression scheme. Additionally, we minimize our memory footprint by using a single language model for both dictation and voice command domains, constructed using Bayesian interpolation. Finally, in order to properly handle device-specific information, such as proper names and other context-dependent information, we inject vocabulary items into the decoder graph and bias the language model on-the-fly. Our system achieves 13.5% word error rate on an open-ended dictation task, running with a median speed that is seven times faster than real-time. View details
    Preview abstract Person name is intrinsically difficult to pronounce due to its variety of origin and non-regularity. This poses significant challenges to contact dialing by voice. In order to mitigate the pronunciation problem for contact names, we propose using personalized pronunciation learning: people can use their own pronunciations for their contact names. We achieve this by implicitly learning from users' corrections in real time, providing a seamless user experience improvement. We show that personalized pronunciation significantly reduces word error for difficult contact names by 15% relatively. View details
    Garbage Modeling for On-device Speech Recognition
    Christophe Van Gysel
    Interspeech 2015, International Speech Communications Association (to appear)
    Preview
    Preview abstract Recently, Google launched YouTube Kids, a mobile application for children, that uses a speech recognizer built specifically for recognizing children’s speech. In this paper we present techniques we explored to build such a system. We describe the use of a neural network classifier to identify matched acoustic training data, filtering data for language modeling to reduce the chance of producing offensive results. We also compare long short-term memory (LSTM) recurrent networks to convolutional, LSTM, deep neural networks (CLDNN). We found that a CLDNN acoustic model outperforms an LSTM across a variety of different conditions, but does not specifically model child speech relatively better than adult. Overall, these findings allow us to build a successful, state-of-the-art large vocabulary speech recognizer for both children and adults. View details
    Composition-based on-the-fly rescoring for salient n-gram biasing
    Keith Hall
    Eunjoon Cho
    Noah Coccaro
    Kaisuke Nakajima
    Linda Zhang
    Interspeech 2015, International Speech Communications Association
    Preview
    Long-Short Term Memory Neural Network for Keyboard Gesture Recognition
    Thomas Breuel
    Johan Schalkwyk
    International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015)
    Preview
    Preview abstract Long Short-Term Memory (LSTM) is a specific recurrent neural network (RNN) architecture that was designed to model temporal sequences and their long-range dependencies more accurately than conventional RNNs. In this paper, we explore LSTM RNN architectures for large scale acoustic modeling in speech recognition. We recently showed that LSTM RNNs are more effective than DNNs and conventional RNNs for acoustic modeling, considering moderately-sized models trained on a single machine. Here, we introduce the first distributed training of LSTM RNNs using asynchronous stochastic gradient descent optimization on a large cluster of machines. We show that a two-layer deep LSTM RNN where each LSTM layer has a linear recurrent projection layer can exceed state-of-the-art speech recognition performance. This architecture makes more effective use of model parameters than the others considered, converges quickly, and outperforms a deep feed forward neural network having an order of magnitude more parameters. View details
    Language Modeling Capitalization
    Brian Strope
    Proc ICASSP, IEEE (2013) (to appear)
    Preview
    Preview abstract We evaluate different architectures to recognize multilingual speech for real-time mobile applications. In particular, we show that combining the results of several recognizers greatly outperforms other solutions such as training a single large multilingual system or using an explicit language identification system to select the appropriate recognizer. Experiments are conducted on a trilingual English-French-Mandarin mobile speech task. The data set includes Google searches, Maps queries, as well as more general inputs such as email and short message dictation. Without pre-specifying the input language, the combined system achieves comparable accu- racy to that of the monolingual systems when the input language is known. The combined system is also roughly 5% absolute better than an explicit language identification approach, and 10% better than a single large multilingual system. View details
    Preview abstract One of the difficult problems of acoustic modeling for Automatic Speech Recognition (ASR) is how to adequately model the wide variety of acoustic conditions which may be present in the data. The problem is especially acute for tasks such as Google Search by Voice, where the amount of speech available per transaction is small, and adaptation techniques start showing their limitations. As training data from a very large user population is available however, it is possible to identify and jointly model subsets of the data with similar acoustic qualities. We describe a technique which allows us to perform this modeling at scale on large amounts of data by learning a treestructured partition of the acoustic space, and we demonstrate that we can significantly improve recognition accuracy in various conditions through unsupervised Maximum Mutual Information (MMI) training. Being fully unsupervised, this technique scales easily to increasing numbers of conditions. View details
    Google Search by Voice: A Case Study
    Johan Schalkwyk
    Doug Beeferman
    Mike Cohen
    Brian Strope
    Advances in Speech Recognition: Mobile Environments, Call Centers and Clinics, Springer (2010), pp. 61-90
    Preview
    Preview abstract Letter units, or graphemes, have been reported in the literature as a surprisingly effective substitute to the more traditional phoneme units, at least in languages that enjoy a strong correspondence between pronunciation and orthography. For English however, where letter symbols have less acoustic consistency, previously reported results fell short of systems using highly-tuned pronunciation lexicons. Grapheme units simplify system design, but since graphemes map to a wider set of acoustic realizations than phonemes, we should expect grapheme-based acoustic models to require more training data to capture these variations. In this paper, we compare the rate of improvement of grapheme and phoneme systems trained with datasets ranging from 450 to 1200 hours of speech. We consider various grapheme unit configurations, including using letter-specific, onset, and coda units. We show that the grapheme systems improve faster and, depending on the lexicon, reach or surpass the phoneme baselines with the largest training set. Index Terms— Acoustic modeling, graphemes, directory assistance, speech recognition. View details
    Speech Recognition Technology
    Herve Bourlard
    Horacio Franco
    The Handbook of Brain Theory and Neural Networks, MIT Press (2003)
    Using Speech/Non-Speech Detection to Bias Recognition Search on Noisy Data
    Daniel Boies
    Mitchel Weintraub
    Qifeng Zhu
    Proc. ICASSP (2003)
    Learning Linguistically Valid Pronunciations from Acoustic Data
    Ananth Sankar
    Shaun Williams
    Mitchel Weintraub
    Proc. Eurospeech (2003)
    Learning Name Pronunciations in Automatic Speech Recognition Systems
    Ananth Sankar
    Shaun Williams
    Mitchel Weintraub
    Proc. ICTAI (2003)
    Speech Recognition Technology
    Herve Bourlard
    Horacio Franco
    Nelson Morgan
    Handbook of Brain Theory and Neural Networks, MIT Press (2002)
    Porting Channel Robustness Across Languages
    Daniel Boies
    Mitchel Weintraub
    Proc. ICSLP (2002)
    Robustness of Noisy Speech Features Using Neural Networks
    Mitchel Weintraub
    Proc. Workshop on Robust Methods for Speech Recognition in Adverse Conditions (1999)
    Robust Text-Independent Speaker Identification over Telephone Channels
    Hema Murthy
    Larry Heck
    IEEE Trans. on Speech and Audio Processing, vol. 7 (1999)
    Discriminative Mixture Weight Estimation for Large Gaussian Mixture Models
    Mitchel Weintraub
    Yochai Konig
    Proc. ICASSP (1999)
    Diagrammatic Methods for Deriving and Relating Temporal Neural Network Algorithms
    Lecture notes from the Caianiello Summer School on Adaptive Processing of Sequences, Springer-Verlag (1998)
    DYNAMO: An Algorithm for Dynamic Acoustic Modeling
    Mitchel Weintraub
    Yochai Konig
    Proc. 1998 DARPA Broadcast News Transcription and Understanding Workshop
    Feature Extraction for Speaker Identification
    Hema Murthy
    Larry Heck
    Mitchel Weintraub
    Proc. Signal Processing, Communications and Networking (1997)
    Neural - Network Based Measures of Confidence for Word Recognition
    Mitch Weintraub
    Ze'ev Rivlin
    Yochai Konig
    Andreas Stolcke
    Proc. ICASSP (1997)
    Transformation for Robust Speaker Recognition from Telephone Data
    Mitchel Weintraub
    Proc. ICASSP (1997)
    Diagrammatic Derivation of Gradient Algorithms for Neural Networks
    Eric Wan
    Neural Computation (1996)
    Feature Extraction and Model Training for Robust Speech Recognition
    Ananth Sankar
    Andreas Stolcke
    Tom Chung
    Leo Neumeyer
    Mitchel Weintraub
    Horacio Franco
    Proc. ARPA Speech Recognition Workshop (1996)
    Transform Domain Adaptive Filters: An Analytical Approach
    IEEE Trans. on Signal Proc. (1995)
    Two-Layer Linear Structures for Fast Adaptive Filtering
    Ph.D. Thesis, Information Systems Laboratory, Department of Electrical Engineering, Stanford University (1995)
    Training Data Clustering for Improved Speech Recognition
    Ananth Sankar
    Vassilios Digalakis
    Proc. Eurospeech (1995)
    On the Advantages of the LMS Spectrum Analyzer over Non-Adaptive Implementations of the Sliding-DFT
    Bernard Widrow
    IEEE Trans. on Circuits and Systems (1995)
    Orthogonalizing Adaptive Algorithms: RLS, DFT/LMS, and DCT/LMS
    Adaptive Inverse Control, B. Widrow and E. Walach (1995)
    Simple Approach to Derive Gradient Algorithms for Arbitrary Neural Network Structures
    Eric Wan
    Proc. WCNN (1994)
    A Unified Approach to Derive Gradient Algorithms for Arbitrary Neural Network Structures
    Eric Wan
    Proc. ICANN (1994)
    Relating Real-Time Backpropagation and Backpropagation Through-Time: An Application of Flow Graph Interreciprocity
    Eric Wan
    Neural Computation (1994)
    Two-Layer Linear Structures for Fast Adaptive Filtering
    Bernard Widrowe
    Proc. WCNN (1994)
    An Efficient First-Order Stochastic Algorithm for Lattice Filters
    Eric Wan
    Proc. ICANN (1994)
    A Simple Approach to Derive Gradient Algorithms for Arbitrary Neural Network Structures
    Eric Wan
    Proc. WCNN (1994)
    Neural Networks to Load-Frequency Control in Power Systems
    Yousef Abdel-Magid
    Bernard Widrow
    Neural Networks, vol. 7 (1994), pp. 183-194
    Simple Algorithms for Fast Adaptive Filtering
    Bernard Widrow
    Proc. of the Fifth Workshop on Neural Networks, WNN93/FNN93 (1993)