Several computer algorithms discovering patterns groups protein sequences use based fitting parameters statistical model group related sequences These include hidden Markov model HMM algorithms multiple sequence alignment MEME Gibbs sampler algorithms discovering motifs These algorithms sometimes prone producing models incorrect two patterns combined The statistical model produced situation convex combination weighted average two different models This paper presents solution problem convex combinations form heuristic based using extremely low variance Dirichlet mixture priors part statistical model This heuristic call megaprior heuristic increases strength ie decreases variance prior proportion size sequence dataset This causes column final model strongly resemble mean single component prior regardless size dataset We describe cause convex combination problem analyze mathematically motivate describe implementation megaprior heuristic show effectively eliminate problem convex combinations protein sequence pattern discovery
This paper describes preliminary work aims apply learning strategies medical followup study An investigation application three machine learning algorithmsR FOIL InductH identify risk factors govern colposuspension cure rate made The goal study induce generalised description explanation classification attribute colposuspension cure rate completely cured improved unchanged worse examples questionnaires We looked set rules described risk factors result differences cure rate The results encouraging indicate machine learning play useful role large scale medical problem solving
In cellular telephone systems important problem dynamically allocate communication resource channels maximize service stochastic caller environment This problem naturally formulated dynamic programming problem use reinforcement learning RL method find dynamic channel allocation policies better previous heuristic solutions The policies obtained perform well broad variety call traffic patterns We present results large cellular system In cellular communication systems important problem allocate communication resource bandwidth maximize service provided set mobile callers whose demand service changes stochastically A given geographical area divided mutually disjoint cells cell serves calls within boundaries see Figure The total system bandwidth divided channels channel centered around frequency Each channel used simultaneously different cells provided cells sufficiently separated spatially interference The minimum separation distance simultaneous reuse channel called channel reuse constraint When call requests service given cell either free channel one violate channel reuse constraint may assigned call else call blocked system happen free channel found Also mobile caller crosses one cell another call handed cell entry new free channel provided call new cell If channel available call must droppeddisconnected system One objective channel allocation policy allocate available channels calls number blocked calls minimized An additional objective minimize number calls dropped handed busy cell These two objectives must weighted appropriately reflect relative importance since dropping existing calls generally undesirable blocking new calls approximately states
In paper bring techniques operations research bear problem choosing optimal actions partially observable stochastic domains We begin introducing theory Markov decision processes mdps partially observable mdps pomdps We outline novel algorithm solving pomdps line show cases finitememory controller extracted solution pomdp We conclude discussion approach relates previous work complexity finding exact solutions pomdps possibilities finding approximate solutions
Graphical models enhance representational power probability models qualitative characterization properties This also leads greater efficiency terms computational algorithms empower representations The increasing complexity models however quickly renders exact probabilistic calculations infeasible We propose principled framework approximating graphical models based variational methods We develop variational techniques perspective unifies expands applicability graphical models These methods allow recursive computation upper lower bounds quantities interest Such bounds yield considerably information mere approximations provide inherent error metric tailoring approximations individually cases considered These desirable properties concomitant variational methods unlikely arise result deterministic stochastic approximations
Realtime Decision algorithms class incremental resourcebounded Horvitz anytime Dean algorithms evaluating influence diagrams We present test domain realtime decision algorithms results experiments several Realtime Decision Algorithms domain The results demonstrate high performance two algorithms decisionevaluation variant Incremental Probabilisitic Inference DAmbrosio variant algorithm suggested Goldszmidt Goldszmidt PKreduced We discuss implications experimental results explore broader applicability algorithms
Speedup learning seeks improve computational efficiency problem solving experience In paper develop formal framework learning efficient problem solving random problems solutions We apply framework two different representations learned knowledge namely control rules macrooperators prove theorems identify sufficient conditions learning representation Our proofs constructive accompanied learning algorithms Our framework captures empirical explanationbased speedup learning unified fashion We illustrate framework implementations two domains symbolic integration Eight Puzzle This work integrates many strands experimental theoretical work machine learning including empirical learning control rules macrooperator learning
In previous paper SM showed finite automata could used define objective functions assessing quality alignment two sequences In paper show results using cost functions We also show extend Hischbergs linear space algorithm Hir setting thus generalizing result Myers Miller MMb
We present offline variant mistakebound model learning Just like well studied online model learner offline model learn unknown concept sequence elements instance space makes guess test trials In models aim learner make mistakes possible The difference models online model set possible elements known offline model sequence elements ie identity elements well order presented known learner advance We give combinatorial characterization number mistakes offline model We apply characterization solve several natural questions arise new model First compare mistake bounds offline learner learner learning concept classes online scenario We show number mistakes online learning log n factor offline learning n length sequence In addition show offline algorithm make constant number mistakes sequence online algorithm also make constant number mistakes The second issue address effect ordering elements number mistakes offline learner It turns sequences offline learner guarantee one mistake yet permutation sequence forces err many elements We prove however gap offline mistake bounds permutations sequence nmany elements larger multiplicative factor log n present examples obtain gap
Wahba Wang Gu Klein Klein introduced Smoothing Spline ANalysis VAriance SS ANOVA method data exponential families Based RKPACK fits SS ANOVA models Gaussian data introduce GRKPACK collection Fortran subroutines binary binomial Poisson Gamma data We also show calculate Bayesian confidence intervals SS ANOVA estimates
This paper presents evolutionary approach incremental approach find learning rules several supervised learning tasks In evolutionary approach potential solutions represented variable length mathematical LISP S expressions Thus similar Genetic Programming GP employs fixed set nonproblem specific functions solve variety problems The model tested three Monks parity problems The results indicate usefulness encoding schema discovering learning rules simple supervised learning problems However hard learning problems require special attention terms need larger size codings potential solutions ability generalisation testing set In order find better solutions issues hill climbing strategy incremental coding potential solutions used discovering learning rules problems It found strategy larger solutions easily coded less computational effort Although better performance achieved training hard learning problems ability generalisation testing cases observed poor
The key quantity needed Bayesian hypothesis testing model selection marginal likelihood model also known integrated likelihood marginal probability data In paper describe way use posterior simulation output estimate marginal likelihoods We describe basic LaplaceMetropolis estimator models without random effects For models random effects compound LaplaceMetropolis estimator introduced This estimator applied data World Fertility Survey shown give accurate results Batching simulation output used assess uncertainty involved using compound LaplaceMetropolis estimator The method allows us test effects independent variables random effects model also test presence random effects
The overfit problem empirical learning utility problem explanationbased learning describe similar phenomenon degradation performance due increase amount learned knowledge Plotting performance learned knowledge course learning performance response reveals common trend several learning methods Modeling trend allows control system constrain amount learned knowledge achieve peak performance avoid general utility problem Experiments evaluate particular empirical model trend analysis learners derive several formal models If evidence suggests general utility problem modeled using mechanisms different learning paradigms model serves unify paradigms one framework capable comparing selecting different learning methods based predicted achievable performance
Hidden Markov Models HMMs applied problems statistical modeling database searching multiple sequence alignment protein families protein domains These methods demonstrated globin family protein kinase catalytic domain EFhand calcium binding motif In case parameters HMM estimated training set unaligned sequences After HMM built used obtain multiple alignment training sequences It also used search SWISSPROT database sequences members given protein family contain given domain The HMM produces multiple alignments good quality agree closely alignments produced programs incorporate threedimensional structural information When employed discrimination tests examining closely sequences database fit globin kinase EFhand HMMs HMM able distinguish members families nonmembers high degree accuracy Both HMM PROFILESEARCH technique used search relationships protein sequence multiply aligned sequences perform better tests PROSITE dictionary sites patterns proteins The HMM appears slight advantage
This paper explores effect initial weight selection feedforward networks learning simple functions backpropagation technique We first demonstrate use Monte Carlo techniques magnitude initial condition vector weight space significant parameter convergence time variability In order understand result additional deterministic experiments performed The results experiments demonstrate extreme sensitivity back propagation initial weight configuration
Some forms memory rely temporarily system brain structures located medial temporal lobe includes hippocampus The recall recent events one task relies crucially proper functioning system As event becomes less recent medial temporal lobe becomes less critical recall event recollection appears rely upon neocortex It proposed process called consolidation responsible transfer memory medial temporal lobe neocortex We examine network model proposed P Alvarez L Squire designed incorporate known features consolidation propose several possible experiments intended help evaluate performance model realistic conditions Finally implement extended version model accommodate varying assumptions number areas connections within brain memory capacity examine performance model Alvarez Squires original task
The map eye brain vertebrates topographic ie neighbouring points eye map neighbouring points brain In addition two eyes innervate target structure two sets fibres segregate form ocular dominance stripes Experimental evidence frog goldfish suggests two phenomena may subserved mechanisms We present computational model addresses formation topography ocular dominance The model based form competitive learning subtractive enforcement weight normalization rule Inputs model distributed patterns activity presented simultaneously eyes An important aspect model ocular dominance segregation occur two eyes positively correlated whereas previous models tended assume zero negative correlations eyes This allows investigation dependence pattern stripes degree correlation eyes find increasing correlation leads narrower stripes Experiments suggested test prediction
We examine methods estimate average variance test error rates set classifiers We begin process drawing classifier random example Given validation data average test error rate estimated validating single classifier Given test example inputs variance computed exactly Next consider process drawing classifier random using examples Once expected test error rate validated validating single classifier However variance must estimated validating classifers yields loose uncertain bounds
We consider formal models learning noisy data Specifically focus learning probability approximately correct model defined Valiant Two widely studied models noise setting classification noise malicious errors However realistic model combining two types noise formalized We define learning environment based natural combination two noise models We first show hypothesis testing possible model We next describe simple technique learning model describe powerful technique based statistical query learning We show noise tolerance improved technique roughly optimal respect desired learning accuracy provides smooth tradeoff tolerable amounts two types noise Finally show statistical query simulation yields learning algorithms combinations noise models thus demonstrating statistical query specification truly An important goal research machine learning determine tasks automated determine information computation requirements One way answer questions development investigation formal models machine learning capture task learning plausible assumptions In work consider formal model learning examples called probably approximately correct PAC learning defined Valiant Val In setting learner attempts approximate unknown target concept simply viewing positive negative examples concept An adversary chooses specified function class hidden f gvalued target function defined specified domain examples chooses probability distribution domain The goal learner output polynomial time high probability hypothesis close target function respect distribution examples The learner gains information target function distribution interacting example oracle At request learner oracle draws example randomly according hidden distribution labels according hidden target function returns labelled example learner A class functions F said PAC learnable captures generic fault tolerance learning algorithm
We present decision tree based approach function approximation reinforcement learning We compare approach table lookup neural network function approximator three problems well known mountain car pole balance problems well simulated automobile race car We find decision tree provide better learning performance neural network function approximation solve large problems infeasible using table lookup
An approach develop new game playing strategies based artificial evolution neural networks presented Evolution directed discover strategies Othello randommoving opponent later fffi search program The networks discovered first standard positional strategy subsequently mobility strategy advanced strategy rarely seen outside tournaments The latter discovery demonstrates evolutionary neural networks develop novel solutions turning initial disadvantage advantage changed environment
We derive general bounds complexity learning Statistical Query model PAC model classification noise We considering problem boosting accuracy weak learning algorithms fall within Statistical Query model This new model introduced Kearns provide general framework efficient PAC learning presence classification noise We first show general scheme boosting accuracy weak SQ learning algorithms proving weak SQ learning equivalent strong SQ learning The boosting efficient used show main result first general upper bounds complexity strong SQ learning Specifically derive simultaneous upper bounds respect number queries Olog VapnikChervonenkis dimension query space Olog inverse minimum tolerance O log In addition show general upper bounds nearly optimal describing class learning problems simultaneously lower bound number queries log We apply boosting results SQ model learning PAC model classification noise Since nearly PAC learning algorithms cast SQ model apply boosting techniques convert PAC algorithms highly efficient SQ algorithms By simulating efficient SQ algorithms PAC model classification noise show nearly PAC algorithms converted highly efficient PAC algorithms tolerate classification noise We give upper bound sample complexity noisetolerant PAC algorithms nearly optimal respect noise rate We also give upper bounds space complexity hypothesis size show two measures fact independent noise rate We note running times noisetolerant PAC algorithms efficient This sequence simulations also demonstrates possible boost accuracy nearly PAC algorithms even presence noise This provides partial answer open problem Schapire first theoretical evidence empirical result Drucker Schapire Simard
The tremendous current effort propose neurally inspired methods computation forces closer scrutiny real world application potential models This paper categorizes applications classes particularly discusses features applications make efficiently amenable neural network methods Computational machines deterministic mappings inputs outputs many computational mechanisms proposed problem solutions Neural network features include parallel execution adaptive learning generalization fault tolerance Often much effort given model applications already implemented much efficient way alternate technology Neural networks potentially powerful devices many classes applications However proposed class applications neural networks efficient large commonly occurring nature Comparison supervised unsupervised generalizing systems also included
Subjectivism become dominant philosophical foundation Bayesian inference Yet practice Bayesian analyses performed socalled noninformative priors priors constructed formal rule We review plethora techniques constructing priors discuss practical philosophical issues arise used We give special emphasis Jeffreyss rules discuss evolution point view interpretation priors away unique representation ignorance toward notion chosen convention We conclude problems raised research priors chosen formal rules serious may dismissed lightly sample sizes small relative number parameters estimated dangerous put faith default solution asymptotics take Jeffreyss rules variants remain reasonable choices We also provide annotated bibliography fl Robert E Kass Professor Larry Wasserman Associate Professor Department Statistics Carnegie Mellon University Pittsburgh Pennsylvania The work authors supported NSF grant DMS NIH grant RCA The authors thank Nick Polson helping annotations Jim Berger Teddy Seidenfeld Arnold Zellner useful comments discussion
Recurrent neural networks become popular models system identification time series prediction NARX Nonlinear AutoRegressive models eXogenous inputs neural network models popular subclass recurrent networks used many applications Though embedded memory found recurrent network models particularly prominent NARX models We show using intelligent memory order selection pruning good initial heuristics significantly improves generalization predictive performance nonlinear systems problems diverse grammatical inference time series prediction
This paper presents incremental concept learning approach identiflcation concepts high overall accuracy The main idea address concept overlap central problem learning multiple descriptions Many traditional inductive algorithms disjunctive version space family considered face problem The approach focuses combinations confldent possibly overlapping concepts original stochastic complexity formula The focusing ecient organized simulated annealingbased beam search The experiments show approach especially suitable developing incremental learning algorithms following advantages flrst generates highly accurate concepts second overcomes certain degree sensitivity order examples third handles noisy examples
Casebased reasoning CBR great deal offer supporting creative design particularly processes rely heavily previous design experience framing problem evaluating design alternatives However existing CBR systems living potential They tend adapt reuse old solutions routine ways producing robust uninspired results Little research effort directed towards kinds situation assessment evaluation assimilation processes facilitate exploration ideas elaboration redefinition problems crucial creative design Also typically rigid control structures facilitate kinds strategic control opportunism inherent creative reasoning In paper describe types behavior would like casebased design systems support based study designers working mechanical engineering problem We show standard CBR framework extended describe architecture developing experiment ideas
In paper present framework building probabilistic automata parameterized contextdependent probabilities Gibbs distributions used model state transitions output generation parameter estimation carried using EM algorithm Mstep uses generalized iterative scaling procedure We discuss relations certain classes stochastic feedforward neural networks geometric interpretation parameter estimation simple example statistical language model constructed using methodology
One characteristics design designers rely extensively past experience order create new designs Because memorybased techniques artificial intelligence help store organise retrieve reuse experiential knowledge held memory good candidates aiding designers Another characteristic design phenomenon exploration early stages design configuration A designer begins illstructured partially defined problem specification process exploration gradually refines modifies hisher understanding problem improves In paper describe demex interactive computeraided design system employs memorybased techniques help users explore design problems pose system order acquire better understanding requirements problems demex applied domain structural design buildings
Uppropagation algorithm inverting learning neural network generative models Sensory input processed inverting model generates patterns hidden variables using topdown connections The inversion process iterative utilizing negative feedback loop depends error signal propagated bottomup connections The error signal also used learn generative model examples The algorithm benchmarked principal component analysis In doctrine unconscious inference Helmholtz argued perceptions formed interaction bottomup sensory data topdown expectations According one interpretation doctrine perception procedure sequential hypothesis testing We propose new algorithm called uppropagation realizes interpretation layered neural networks It uses topdown connections generate hypotheses bottomup connections revise It important understand difference uppropagation ancestor backpropagation algorithm Backpropagation learning algorithm recognition models As shown Figure bottomup connections recognize patterns topdown connections propagate error signal used learn recognition model In contrast uppropagation algorithm inverting learning generative models shown Figure b Topdown connections generate patterns set hidden variables Sensory input processed inverting generative model recovering hidden variables could generated sensory data This operation called either pattern recognition pattern analysis depending meaning hidden variables Inversion generative model done iteratively negative feedback loop driven error signal bottomup connections The error signal also used learning connections experiments images handwritten digits
This paper demonstrates exploitation certain vision processing techniques index case base surfaces The surfaces result reinforcement learning represent optimum choice actions achieve goal anywhere state space This paper shows strong features occur interaction system environment detected early learning process Such features allow system identify identical similar task solved previously retrieve relevant surface This results orders magnitude increase learning rate
Combining different machine learning algorithms system produce benefits beyond either method could achieve alone This paper demonstrates genetic algorithms used conjunction lazy learning solve examples difficult class delayed reinforcement learning problems better either method alone This class class differential games includes numerous important control problems arise robotics planning game playing areas solutions differential games suggest solution strategies general class planning control problems We conducted series experiments applying three learning approacheslazy Qlearning knearest neighbor kNN genetic algorithmto particular differential game called pursuit game Our experiments demonstrate kNN great difficulty solving problem lazy version Qlearning performed moderately well genetic algorithm performed even better These results motivated next step experiments hypothesized kNN difficulty good examplesa common source difficulty lazy learning Therefore used genetic algorithm bootstrapping method kNN create system provide examples Our experiments demonstrate resulting joint system learned solve pursuit games high degree accuracyoutperforming either method aloneand relatively small memory requirements
We describe hierarchical generative model viewed nonlinear generalization factor analysis implemented neural network The model uses bottomup topdown lateral connections perform Bayesian perceptual inference correctly Once perceptual inference performed connection strengths updated using simple learning rule requires locally available information We demon strate network learns extract sparse distributed hierarchical representations
In applications neuroevolution individual population represents complete neural network Recent work SANE system however demonstrated evolving individual neurons often produces efficient genetic search This paper demonstrates SANE solve easy tasks quickly often stalls larger problems A hierarchical approach neuroevolution presented overcomes SANEs difficulties integrating neuronlevel exploratory search networklevel exploitive search In robot arm manipulation task hierarchical approach outperforms neuronbased search networkbased search
Reinforcement learning addresses problem learning select actions order maximize ones performance unknown environments To scale reinforcement learning complex realworld tasks typically studied AI one must ultimately able discover structure world order abstract away myriad details operate tractable problem spaces This paper presents SKILLS algorithm SKILLS discovers skills partially defined action policies arise context multiple related tasks Skills collapse whole action sequences single operators They learned minimizing compactness action policies using description length argument representation Empirical results simple grid navigation tasks illustrate successful discovery structure reinforcement learning
We consider generalization mistakebound model learning f gvalued functions learner must satisfy general constraint number M incorrect predictions number M incorrect predictions We describe generalpurpose optimal algorithm formulation problem We describe several applications general results involving situations learner wishes satisfy linear inequalities M M
A critical issue users Markov Chain Monte Carlo MCMC methods applications determine safe stop sampling use samples estimate characteristics distribution interest Research methods computing theoretical convergence bounds holds promise future currently yielded relatively little practical use applied work Consequently MCMC users address convergence problem applying diagnostic tools output produced running samplers After giving brief overview area provide expository review thirteen convergence diagnostics describing theoretical basis practical implementation We compare performance two simple models conclude methods fail detect sorts convergence failure designed identify We thus recommend combination strategies aimed evaluating accelerating MCMC sampler convergence including applying diagnostic procedures small number parallel chains monitoring autocorrelations crosscorrelations modifying parameterizations sampling algorithms appropriately We emphasize however possible say certainty finite sample MCMC algorithm representative underlying stationary distribution Mary Kathryn Cowles Assistant Professor Biostatistics Harvard School Public Health Boston MA Bradley P Carlin Associate Professor Division Biostatistics School Public Health University Minnesota Minneapolis MN Much work done first author graduate student Divison Biostatistics University Minnesota Assistant Professor Biostatistics Section Department Preventive Societal Medicine University Nebraska Medical Center Omaha NE The work authors supported part National Institute Allergy Infectious Diseases FIRST Award RAI The authors thank developers diagnostics studied sharing insights experiences software Drs Thomas Louis Luke Tierney helpful discussions suggestions greatly improved manuscript
Although detection invariant structure given set input patterns vital many recognition tasks connectionist learning rules tend focus directions high variance principal components The prediction paradigm often used reconcile dichotomy suggest direct approach invariant learning based antiHebbian learning rule An unsupervised twolayer network implementing method competitive setting learns extract coherent depth information randomdot stereograms
Instancebased learning methods explicitly remember data receive They usually training phase prediction time perform computation Then take query search database similar datapoints build online local model local average local regression predict output value In paper review advantages instance based methods autonomous systems also note ensuing cost hopelessly slow computation database grows large We present evaluate new way structuring database new algorithm accessing maintains advantages instancebased learning Earlier attempts combat cost instancebased learning sacrificed explicit retention data applicable instancebased predictions based small number near neighbors reintroduce explicit training phase form interpolative data structure Our approach builds multiresolution data structure summarize database experiences resolutions interest simultaneously This permits us query database exibility conventional linear search greatly reduced computational cost
Discrete Bayesian models used model uncertainty mobilerobot navigation question actions chosen remains largely unexplored This paper presents optimal solution problem formulated partially observable Markov decision process Since solving optimal control policy intractable general goes explore variety heuristic control strategies The control strategies compared experimentally simulation runs robot
This NRL NCARAI technical note AIC describes work Salzbergs NGE I recently implemented algorithm run case studies The purpose note publicize implementation note curious result using This implementation NGE available WWW address
Technical Report No Department Statistics University Toronto Abstract I present new Markov chain sampling method appropriate distributions isolated modes Like recentlydeveloped method simulated tempering tempered transition method uses series distributions interpolate distribution interest distribution sampling easier The new method advantage require approximate values normalizing constants distributions needed simulated tempering tedious estimate Simulated tempering performs random walk along series distributions used In contrast tempered transitions new method move systematically desired distribution easilysampled distribution back desired distribution This systematic movement avoids inefficiency random walk advantage unfortunately cancelled increase number interpolating distributions required Because sampling efficiency tempered transition method simple problems similar simulated tempering On complex distributions however simulated tempering tempered transitions may perform differently Which better depends ways interpolating distributions deceptive
We describe ongoing project develop adaptive training system ATS dynamically models students learning processes provide specialized tutoring adapted students knowledge state learning style The student modeling component ATS MLModeler uses machine learning ML techniques emulate students novicetoexpert transition MLModeler infers learning methods student used reach current knowledge state comparing students solution trace expert solution generating plausible hypotheses misconceptions errors student made A casebased approach used generate hypotheses incorrectly applying analogy overgeneralization overspecialization The student expert models use networkbased representation includes abstract concepts relationships well strategies problem solving Fuzzy methods used represent uncertainty student model This paper describes design ATS MLModeler gives detailed example system would model tutor student typical session The domain use example highschool level chemistry
Metacognition addresses issues knowledge cognition regulating cognition We argue regulation process improved growing experience Therefore mental models needed facilitate reuse previous regulation processes We satisfy requirement describing casebased approach Introspection Planning utilises previous experience obtained reasoning metalevel object level The introspection plans used approach support various metacognitive tasks identified generation selfquestions As example introspection planning metacognitive behaviour system IULIAN described
Graphical Markov models use graphs either undirected directed mixed represent possible dependences among statistical variables Applications undirected graphs UDGs include models spatial dependence image analysis acyclic directed graphs ADGs especially convenient statistical analysis arise fields genetics psychometrics models expert systems Bayesian belief networks Lauritzen Wermuth Frydenberg LWF introduced Markov property chain graphs mixed graphs used represent simultaneously causal associative dependencies include UDGs ADGs special cases In paper alternative Markov property AMP chain graphs introduced ways direct extension ADG Markov property LWF property chain graph
The fault hierarchy representation widely used expert systems diagnosis complex mechanical devices This paper describes theory revision algorithm revises fault hierarchies This task presents several challenges typical training instances missing feature values pattern missing features significant rather merely effect noise quality candidate theory depends correctness diagnoses returns set tests uses reach diagnoses This paper describes addresses challenges reports experiments use improve performance two fielded diagnostic systems fl This extended version paper appeared Proceedings Fifth International Workshop Principles Diagnosis Dx New York October We gratefully acknowledge receiving helpful comments CheoungNam Lee Glenn Meredith Chandra Mouleeswaran z Current address Robotics Laboratory Computer Science Department Stanford University Stanford CA email langleyflamingostanfordedu phone fax
Machine learning game strategies often depended competitive methods continually develop new strategies capable defeating previous ones We use inclusive definition game consider framework within competitive algorithm makes repeated use strategy learning component learn strategies defeat given set opponents We describe game learning terms sets H X first second player strategies connect model familiar models concept learning We show importance ideas teaching set specification number k new context The performance several competitive algorithms investigated using worstcase randomized strategy learning algorithms Our central result Theorem competitive algorithm solves games total number strategies polynomial lgjHj lgjX j k Its use demonstrated including application concept learning new kind counterexample oracle We conclude complexity analysis game learning list number new questions arising work
Most work attempts give bounds generalization error hypothesis generated learning algorithm based methods theory uniform convergence These bounds apriori bounds hold distribution examples calculated data observed In paper propose different approach bounding generalization error data observed A selfbounding learning algorithm algorithm addition hypothesis outputs outputs reliable upper bound generalization error hypothesis We first explore idea statistical query learning framework Kearns After give explicit self bounding algorithm learning algorithms based local search
In paper propose new framework studying Markov decision processes MDPs based ideas statistical mechanics The goal learning MDPs find policy yields maximum expected return time In choosing policies agents must therefore weigh prospects shortterm versus longterm gains We study simple MDP agent must constantly decide exploratory jumps local reward mining state space The number policies choose grows exponentially size state space N We view expected returns defining energy landscape policy space Methods statistical mechanics used analyze landscape thermodynamic limit N We calculate overall distribution expected returns well distribution returns policies fixed Hamming distance optimal one We briefly discuss problem learning optimal policies empirical estimates expected return As first step relate findings entropy limit hightemperature learning Numerical simulations support theoretical results
This paper shows neural networks use continuous activation functions VC dimension least large square number weights w This result settles longstanding open question namely whether wellknown Ow log w bound known hardthreshold nets also held general sigmoidal nets Implications number samples needed valid generalization discussed
Novel online learning algorithms self adaptive learning rates parameters blind separation signals proposed The main motivation development new learning rules improve convergence speed reduce crosstalking especially nonstationary signals Furthermore discovered conditions proposed neural network models associated learning algorithms exhibit random switch attention ie ability chaotic random switching crossover output signals way specified separated signal may appear various outputs different time windows Validity performance dynamic properties proposed learning algorithms investigated computer simulation experiments
I present modular network architecture learning algorithm based incremental dynamic programming allows single learning agent learn solve multiple Markovian decision tasks MDTs significant transfer learning across tasks I consider class MDTs called composite tasks formed temporally concatenating number simpler elemental MDTs The architecture trained set composite elemental MDTs The temporal structure composite task assumed unknown architecture learns produce temporal decomposition It shown certain conditions solution composite MDT constructed computationally inexpensive modifications solutions constituent elemental MDTs
Scientists engineers face recurring problems constructing testing modifying numerical simulation programs The process coding revising simulators extremely timeconsuming almost always written conventional programming languages Scientists engineers therefore benefit software facilitates construction programs simulating physical systems Our research adapts methodology deductive program synthesis problem constructing numerical simulation codes We focused simulators represented second order functional programs composed numerical integration root extraction routines We developed system uses first order Horn logic synthesize numerical simulators built components Our approach based two ideas First axiomatize relationship integration differentiation We neither attempt require complete axiomatization mathematical analysis Second system uses representation functions reified objects Function objects encoded lambda expressions Our knowledge base includes axiomatization term equality lambda calculus It also includes axioms defining semantics numerical integration root extraction routines We use depth bounded SLD resolution construct proofs synthesize programs Our system successfully constructed numerical simulators computational design jet engine nozzles sailing yachts among others Our results demonstrate deductive synthesis techniques used construct numerical simulation programs realistic applications Ellman Murata Automatic design optimization highly sensitive problem formulation The choice objective function constraints design parameters dramatically impact computational cost optimization quality resulting design The best formulation varies one application another A design engineer usually know best formulation advance In order address problem developed system supports interactive formulation testing reformulation design optimization strategies Our system includes executable dataflow language representing optimization strategies The language allows engineer define multiple stages optimization using different approximations objective constraints different abstractions design space We also developed set transformations reformulate strategies represented language The transformations approximate objective constraint functions abstract reparameterize search spaces divide optimization process multiple stages The system applicable principle design problem expressed terms constrained op
Bayesian networks provide language qualitatively representing conditional independence properties distribution This allows natural compact representation distribution eases knowledge acquisition supports effective inference algorithms It wellknown however certain independencies capture qualitatively within Bayesian network structure independencies hold certain contexts ie given specific assignment values certain variables In paper propose formal notion contextspecific independence CSI based regularities conditional probability tables CPTs node We present technique analogous based dseparation determining independence holds given network We focus particular qualitative representation schemetreestructured CPTs capturing CSI We suggest ways representation used support effective inference algorithms In particular present structural decomposition resulting network improve performance clustering algorithms alternative algorithm based cutset conditioning
The eligibility trace one basic mechanisms used reinforcement learning handle delayed reward In paper introduce new kind eligibility trace replacing trace analyze theoretically show results faster reliable learning conventional trace Both kinds trace assign credit prior events according recently occurred conventional trace gives greater credit repeated events Our analysis conventional replacetrace versions oine TD algorithm applied undiscounted absorbing Markov chains First show methods converge repeated presentations training set predictions two well known Monte Carlo methods We analyze relative efficiency two Monte Carlo methods We show method corresponding conventional TD biased whereas method corresponding replacetrace TD unbiased In addition show method corresponding replacing traces closely related maximum likelihood solution tasks mean squared error always lower long run Computational results confirm analyses show applicable generally In particular show replacing traces significantly improve performance reduce parameter sensitivity MountainCar task full reinforcementlearning problem continuous state space using featurebased function approximator
Reading studied decades variety cognitive disciplines yet theories exist sufficiently describe explain people accomplish complete task reading realworld texts In particular type knowledge intensive reading known creative reading largely ignored past research We argue creative reading aspect practically reading experiences result theory overlooks insufficient We built results psychology artificial intelligence education order produce functional theory complete reading process The overall framework describes set tasks necessary reading performed Within framework developed theory creative reading The theory implemented ISAAC Integrated Story Analysis And Creativity system reading system reads science fiction stories
We study problem combining updates special instance theory change counterfactual conditionals propositional knowledgebases Intuitively update means world described knowledgebase changed This opposed revisions another instance theory change knowledge static world changes A counterfactual implication statement form If A case B would also case negation A may derivable current knowledge We present decidable logic called VCU update counterfactual implication connectives object language Our update operator generalization operators previously proposed studied literature We show operator satisfies certain postulates set forth reasonable update The logic VCU extension D K Lewis logic VCU counterfactual conditionals The semantics VCU multimodal propositional calculus based possible worlds The infamous Ramsey Rule becomes derivation rule sound complete axiomatization We show Gardenfors Triviality Theorem impossibility combine theory change counterfactual conditionals via Ramsey Rule hold logic It thus seen Triviality Theorem applies revision operators updates fl A preliminary version paper presented Second International Conference Principles Knowledge Representation Reasoning Cambridge Massachusetts April The work partially performed author visiting Department Computer Science University Toronto
Many neural net learning algorithms aim finding simple nets explain training data The expectation simpler networks better generalization test data Occams razor Previous implementations however use measures simplicity lack power universality elegance based Kolmogorov complexity Solomonoffs algorithmic probability Likewise previous approaches especially Bayesian kind suffer problem choosing appropriate priors This paper addresses issues It first reviews basic concepts algorithmic complexity theory relevant machine learning SolomonoffLevin distribution universal prior deals prior problem The universal prior leads probabilistic method finding algorithmically simple problem solutions high generalization capability The method based Levin complexity timebounded generalization Kolmogorov complexity inspired Levins optimal universal search algorithm For given problem solution candidates computed efficient selfsizing programs influence runtime storage size The probabilistic search algorithm finds good programs ones quickly computing algorithmically probable solutions fitting training data Simulations focus task discovering algorithmically simple neural networks low Kolmogorov complexity high generalization capability It demonstrated method least certain toy problems computationally feasible lead generalization results unmatchable previous neural net algorithms Much remains done however make large scale applications incremental learning feasible
We studied problem generating expressive musical performances context tenor saxophone interpretations We done several recordings tenor sax playing different Jazz ballads different degrees expressiveness including inexpressive interpretation ballad These recordings analyzed using SMS spectral modeling techniques extract information related several expressive parameters This set parameters scores constitute set cases examples casebased system From set cases system infers set possible expressive transformations given new phrase applying similarity criteria based background musical knowledge new phrase set cases Finally SaxEx applies inferred expressive transformations new phrase using synthesis capabilities SMS
One surprising recurring phenomena observed experiments boosting test error generated classifier usually increase size becomes large often observed decrease even training error reaches zero In paper show phenomenon related distribution margins training examples respect generated voting classification rule margin example simply difference number correct votes maximum number votes received incorrect label We show techniques used analysis Vapniks support vector classifiers neural networks small weights applied voting methods relate margin distribution test error We also show theoretically experimentally boosting especially effective increasing margins training examples Finally compare explanation based biasvariance decomposition
Realworld learning tasks may involve highdimensional data sets arbitrary patterns missing data In paper present framework based maximum likelihood density estimation learning data sets We use mixture models density estimates make two distinct appeals ExpectationMaximization EM principle Dempster et al deriving learning algorithmEM used estimation mixture components coping missing data The resulting algorithm applicable wide range supervised well unsupervised learning problems Results classification benchmarkthe iris data setare presented
The hierarchical feature map system recognizes input story instance particular script classifying three levels scripts tracks role bindings The recognition taxonomy ie breakdown script tracks roles extracted automatically independently script examples script instantiations unsupervised selforganizing process The process resembles human learning differentiation frequently encountered scripts become gradually detailed The resulting structure hierachical pyramid feature maps The hierarchy visualizes taxonomy maps lay topology level The number input lines selforganization time considerably reduced compared ordinary singlelevel feature mapping The system recognize incomplete stories recover missing events The taxonomy also serves memory organization scriptbased episodic memory The maps assign unique memory location script instantiation The salient parts input data separated resources concentrated representing accurately
It shown static neural approaches adaptive target detection replaced efficient sequential alternative The latter inspired observation biological systems employ sequential eyemovements pattern recognition A system described builds adaptive model timevarying inputs artificial fovea controlled adaptive neural controller The controller uses adaptive model learning sequential generation fovea trajectories causing fovea move target visual scene The system also learns track moving targets No teacher provides desired activations eyemuscles various times The goal information shape target Since task rewardonlyatgoal task involves complex temporal credit assignment problem Some implications adaptive attentive systems general discussed
We present treestructured architecture supervised learning The statistical model underlying architecture hierarchical mixture model mixture coefficients mixture components generalized linear models GLIMs Learning treated maximum likelihood problem particular present ExpectationMaximization EM algorithm adjusting parameters architecture We also develop online learning algorithm parameters updated incrementally Comparative simulation results presented robot dynamics domain We want thank Geoffrey Hinton Tony Robinson Mitsuo Kawato Daniel Wolpert helpful comments manuscript This project supported part grant McDonnellPew Foundation grant ATR Human Information Processing Research Laboratories grant Siemens Corporation grant IRI National Science Foundation grant NJ Office Naval Research The project also supported NSF grant ASC support Center Biological Computational Learning MIT including funds provided DARPA HPCC program NSF grant ECS support Initiative Intelligent Control MIT Michael I Jordan NSF Presidential Young Investigator
The EM algorithm performs maximum likelihood estimation data variables unobserved We present function resembles negative free energy show M step maximizes function respect model parameters E step maximizes respect distribution unobserved variables From perspective easy justify incremental variant EM algorithm distribution one unobserved variables recalculated E step This variant shown empirically give faster convergence mixture estimation problem A variant algorithm exploits sparse conditional distributions also described wide range variant algorithms also seen possible
A network WilsonCowan oscillators constructed emergent properties synchronization desynchronization investigated computer simulation formal analysis The network twodimensional matrix oscillator coupled neighbors We show analytically chain locally coupled oscillators piecewise linear approximation WilsonCowan oscillator synchronizes present technique rapidly entrain finite numbers oscillators The coupling strengths change fast time scale based Hebbian rule A global separator introduced receives input sends feedback oscillator matrix The global separator used desynchronize different oscillator groups Unlike many models properties network emerge local connections preserve spatial relationships among components critical encoding Gestalt principles feature grouping The ability synchronize desynchronize oscillator groups within network offers promising approach pattern segmentation figureground segregation based oscillatory correlation
In paper I describe implementation probabilistic regression model BUGS BUGS program carries Bayesian inference statistical problems using simulation technique known Gibbs sampling It possible implement surprisingly complex regression models environment I demonstrate simultaneous inference interpolant inputdependent noise level
Classifying hand complex data coming psychology experiments long difficult task quantity data classify amount training may require One way alleviate problem use machine learning techniques We built classifier based decision trees reproduces classifying process used two humans sample data learns classify unseen data The automatic classifier proved accurate constant much faster classification hand
The estimation training methods neural network literature usually simple form gradient descent algorithm suitable implementation hardware using massively parallel computations For ordinary computers massively parallel optimization algorithms several SAS procedures usually far efficient This talk shows fit neural networks using SASOR R fl SASETS R fl SASSTAT R fl software
Selecting right reference class right interval faced conflicting candidates possibility establishing subset style dominance problem Kyburgs Evidential Probability system Various methods proposed Loui Kyburg solve problem way intuitively appealing justifiable within Kyburgs framework The scheme proposed paper leads stronger statistical assertions without sacrificing much intuitive appeal Kyburgs latest proposal
We apply reinforcement learning methods learn domainspecific heuristics job shop scheduling A repairbased scheduler starts criticalpath schedule incrementally repairs constraint violations goal finding short conflictfree schedule The temporal difference algorithm T D applied train neural network learn heuristic evaluation function states This evaluation function used onestep lookahead search procedure find good solutions new scheduling problems We evaluate approach synthetic problems problems NASA space shuttle payload processing task The evaluation function trained problems involving small number jobs tested larger problems The TD scheduler performs better best known existing algorithm taskZwebens iterative repair method based simulated annealing The results suggest reinforcement learning provide new method constructing highperformance scheduling systems
A neural network approach classic inverted pendulum task presented This task task keeping rigid pole hinged cart free fall plane roughly vertical orientation moving cart horizontally plane keeping cart within maximum distance starting position This task constitutes difficult control problem parameters cartpole system known precisely variable It also forms basis even complex controllearning problem controller must learn proper actions successfully balancing pole given current state system failure signal pole angle vertical becomes great cart exceeds one boundaries placed position The approach presented demonstrated effective realtime control small selfcontained minirobot specially outfitted task Origins details learning scheme specifics minirobot hardware results actual learning trials presented
Platts resourceallocation network RAN Platt b modified reinforcementlearning paradigm restart existing hidden units rather adding new units After restarting units continue learn via backpropagation The resulting restart algorithm tested Qlearning network learns solve inverted pendulum problem Solutions found faster average restart algorithm without
We propose new processing paradigm called Expandable Split Window ESW paradigm exploiting finegrain parallelism This paradigm considers window instructions possibly dependencies single unit exploits finegrain parallelism overlapping execution multiple windows The basic idea connect multiple sequential processors decoupled decentralized manner achieve overall multiple issue This processing paradigm shares number properties restricted dataflow machines derived sequential von Neumann architecture We also present implementation Expandable Split Window execution model preliminary performance results
Algorithms based Nested Generalized Exemplar NGE theory Salzberg classify new data points computing distance nearest generalized exemplar ie either point axisparallel rectangle They combine distancebased character nearest neighbor NN classifiers axisparallel rectangle representation employed many rulelearning systems An implementation NGE compared knearest neighbor kNN algorithm domains found significantly inferior kNN Several modifications NGE studied understand cause poor performance These show performance substantially improved preventing NGE creating overlapping rectangles still allowing complete nesting rectangles Performance improved modifying distance metric allow weights features Salzberg Best results obtained study weights computed using mutual information features output class The best version NGE developed batch algorithm BNGE FW MI usertunable parameters BNGE FW MI performance comparable firstnearest neighbor algorithm also incorporating feature weights However knearest neighbor algorithm still significantly superior BNGE FW MI domains inferior We conclude even improvements NGE approach sensitive shape decision boundaries classification problems In domains decision boundaries axisparallel NGE approach produce excellent generalization interpretable hypotheses In domains tested NGE algorithms require much less memory store generalized exemplars required NN algorithms
Selecting good model set input points cross validation computationally intensive process especially number possible models number training points high Techniques gradient descent helpful searching space models problems local minima importantly lack distance metric various models reduce applicability search methods Hoeffding Races technique finding good model data quickly discarding bad models concentrating computational effort differentiating better ones This paper focuses special case leaveoneout cross validation applied memorybased learning algorithms also argue applicable class model selection problems
In many learning problems learning system presented values features actually irrelevant concept trying learn The FOCUS algorithm due Almuallim Dietterich performs explicit search smallest possible input feature set S permits consistent mapping features S output feature The FOCUS algorithm also seen algorithm learning determinations functional dependencies suggested Another algorithm learning determinations appears The FOCUS algorithm superpolynomial runtime Almuallim Dietterich leave open question tractability underlying problem In paper problem shown NPcomplete We also describe briefly experiments demonstrate benefits determination learning show finding lowestcardinality determinations easier practice finding minimal determi Define MINFEATURES problem follows given set X examples composed binary value specifying value target feature vector binary values specifying values features number n determine whether exists feature set S We show MINFEATURES NPcomplete reducing VERTEXCOVER MINFEATURES The VERTEXCOVER problem may stated question given graph G vertices V edges E subset V V size edge E connected least one vertex V We may reduce instance VERTEXCOVER instance MINFEATURES mapping edge E example X one input feature every vertex V In proof reported result reduction set covering The proof therefore fails show NPcompleteness nations
An unsupervised learning algorithm multilayer network stochastic neurons described Bottomup recognition connections convert input representations successive hidden layers topdown generative connections reconstruct representation one layer representation layer In wake phase neurons driven recognition connections generative connections adapted increase probability would reconstruct correct activity vector layer In sleep phase neurons driven generative connections recognition connections adapted increase probability would produce Supervised learning algorithms multilayer neural networks face two problems They require teacher specify desired output network require method communicating error information connections The wakesleep algorithm avoids problems When external teaching signal matched goal required force hidden units extract underlying structure In wakesleep algorithm goal learn representations economical describe allow input reconstructed accurately We quantify goal imagining communication game vector raw sensory inputs communicated receiver first sending hidden representation sending difference input vector topdown reconstruction hidden representation The aim learning minimize description length total number bits would required communicate input vectors way No communication actually takes place minimizing description length would required forces network learn economical representations capture underlying regularities data correct activity vector layer
Properly structured software libraries crucial success software reuse Specifically structure software library ought reect functional similarity stored software components order facilitate retrieval process We propose application artificial neural network technology achieve structured library In detail utilize artificial neural network adhering unsupervised learning paradigm The distinctive feature model make semantic relationship stored software components geographically explicit Thus actual user software library gets notion semantic relationship components terms geographical closeness
Learning fundamental component intelligence key consideration designing cognitive architectures Soar Laird et al This chapter considers question constitutes appropriate generalpurpose learning mechanism We interested mechanisms might explain reproduce rich variety learning capabilities humans ranging learning perceptualmotor skills ride bicycle learning highly cognitive tasks play chess Research learning fields cognitive science artificial intelligence neurobiology statistics led identification two distinct classes learning methods inductive analytic Inductive methods neural network Backpropagation learn general laws finding statistical correlations regularities among large set training examples In contrast analytical methods ExplanationBased Learning acquire general laws many fewer training examples They rely instead prior knowledge analyze individual training examples detail use analysis distinguish relevant example features irrelevant The question considered chapter best combine inductive analytical learning architecture seeks cover range learning exhibited intelligent systems humans We present specific learning mechanism Explanation Based Neural Network learning EBNN blends two types learning present experimental results demonstrating ability learn control strategies mobile robot using
Simulation plays important role stochastic geometry related fields simplest random set models tend intractable analysis Many simulation algorithms deliver approximate samples random set models example simulating equilibrium distribution Markov chain spatial birthanddeath process The samples usually fail exact algorithm simulates Markov chain long finite time thus convergence equilibrium approximate The seminal work Propp Wilson made important contribution simulation proposing coupling method Coupling Past CFTP delivers perfect say exact simulations Markov chains In paper introduce new idea perfect simulation illustrate using two common models stochastic geometry dead leaves model Boolean model conditioned cover finite set points
In recent years casebased reasoning demonstrated highly useful problem solving complex domains Also mixed paradigm approaches emerged combining CBR induction techniques aiming verifying knowledge andor building efficient case memory However complex domains induction whole problem space often possible time consuming In paper approach presented owing close interaction CBR part attempts induce rules particular context ie problem solved CBRoriented system These rules may used indexing purposes similarity assessment order support CBR process future
This paper demonstrates tandem use finite automata learning algorithm utility planner adversarial robotic domain For many applications robot agents need predict movement objects environment plan avoid When robot reasoning model object machine learning techniques used generate one In project learn DFA model adversarial robot use automaton predict next move adversary The robot agent plans path avoid adversary predicted location fulfilling goal requirements
Claudia Cargnoni Dipartimento Statistico Universita di Firenze Firenze Italy Peter Muller Assistant Professor Mike West Professor Institute Statistics Decision Sciences Duke University Durham NC Research Cargnoni performed visiting ISDS Muller West partially supported NSF grant DMS
Our theoretical understanding properties genetic algorithms GAs used function optimization GAFOs strong would like Traditional schema analysis provides first order insights doesnt capture nonlinear dynamics GA search process well Markov chain theory used primarily steady state analysis GAs In paper explore use transient Markov chain analysis model understand behavior finite population GAFOs observed transition steady states This approach appears provide new insights circumstances GAFOs perform well Some preliminary results presented initial evaluation merits approach provided
In paper consider application training noise multilayer perceptron input variables relevance determination Noise injection modified order penalize irrelevant features The proposed algorithm attractive requires tuning single parameter This parameter controls penalization inputs together complexity model After presentation method experimental evidences given simulated data sets
COINS Technical Report January Abstract In paper present new multivariate decision tree algorithm LMDT combines linear machines decision trees LMDT constructs test decision tree training linear machine eliminating irrelevant noisy variables controlled manner To examine LMDTs ability find good generalizations present results variety domains We compare LMDT empirically univariate decision tree algorithm observe multivariate tests appropriate bias given data set LMDT finds small accurate trees
Reinforcement learning RL modelfree tuning adaptation method control dynamic systems Contrary supervised learning based usually gradient descent techniques RL require model sensitivity function process Hence RL applied systems poorly understood uncertain nonlinear reasons untractable conventional methods In reinforcement learning overall controller performance evaluated scalar measure called reinforcement Depending type control task reinforcement may represent evaluation recent control action often entire sequence past control moves In latter case RL system learns predict outcome individual control action This prediction used adjust parameters controller The mathematical background RL closely related optimal control dynamic programming This paper gives comprehensive overview RL methods presents application attitude control satellite Some well known applications literature reviewed well
A biologically motivated mechanism selforganizing neural network modifiable lateral connections presented The weight modification rules purely activitydependent unsupervised local The lateral interaction weights initially random develop Mexican hat shape around neuron At time external input weights selforganize form topological map input space The algorithm demonstrates selforganization bootstrap using input information Predictions algorithm agree well experimental observations development lateral connections cortical feature maps
Some main users statistical methods economists social scientists epidemiologists discovering fields rest statistical causal foundations The blurring foundations years follows lack mathematical notation capable distinguishing causal equational relationships By providing formal natural explication relations graphical methods potential revolutionize statistics used knowledgerich applications Statisticians response beginning realize causality metaphysical deadend meaningful concept clear mathematical underpinning The paper surveys developments outlines future challenges
This paper describes new method inducing logic programs examples attempts integrate best aspects existing ILP methods single coherent framework In particular combines bottomup method similar Golem topdown method similar Foil It also includes method predicate invention similar Champ elegant solution noisy oracle problem allows system learn recursive programs without requiring complete set positive examples Systematic experimental comparisons Golem Foil range problems used clearly demonstrate advantages approach
We present deterministic techniques computing upper lower bounds marginal probabilities sigmoid noisyOR networks These techniques become useful size network clique size precludes exact computations We illustrate tightness bounds numerical experi ments
MIT Computational Cognitive Science Technical Report Abstract We develop recursive nodeelimination formalism efficiently approximating large probabilistic networks No constraints set network topologies Yet formalism straightforwardly integrated exact methods whenever arebecome applicable The approximations use controlled maintain consistently upper lower bounds desired quantities times We show Boltzmann machines sigmoid belief networks combination ie chain graphs handled within framework The accuracy methods verified exper imentally
We prove lower bound ln ffi VCdimC number random examples required distributionfree learning concept class C VCdimC VapnikChervonenkis dimension ffi accuracy confidence parameters This improves previous best lower bound ln ffi VCdimC comes close known general upper bound O ffi VCdimC ln consistent algorithms We show many interesting concept classes including kCNF kDNF bound actually tight within constant factor
The ability handle temporal variation important dealing realworld dynamic signals In many applications inputs come fixedrate sequences rather signals time scales vary one instance next thus modeling dynamic signals requires ability recognize sequences also ability handle temporal changes signal This paper discusses Tau Net neural network modeling dynamic signals application speech In Tau Net sequence learning accomplished using combination prediction recurrence timedelay connections Temporal variability modeled adaptable time constants network adjusted respect prediction error Adapting time constants changes time scale network adapted value networks time constant provides measure temporal variation signal Tau Net applied several simple signals sets sine waves differing frequency phase multidimensional signal representing walking gait children energy contour simple speech utterance Tau Net also shown work voicing distinction task using synthetic speech data In paper Tau Net applied two speakerindependent tasks vowel recognition faeiyuxg consonant recognition fptkg using speech data taken TIMIT database It shown Tau Nets trained mediumrate tokens achieved performance networks without time constants trained tokens rates performed better networks without time constants trained mediumrate tokens Our results demonstrate Tau Nets ability identify vowels consonants variable speech rates extrapolating rates represented training set
Backpropagation learning BP known serious limitations generalising knowledge certain types learning material BPSOM extension BP overcomes limitations BPSOM combination multilayered feedforward network MFN trained BP Kohonens selforganising maps SOMs In earlier reports shown BPSOM improved generalisation performance whereas decreased simultaneously number necessary hidden units without loss generalisation performance These two effects use SOM learning training MFNs In paper focus two additional effects First show BPSOM training activations hidden units MFNs tend oscillate among limited number discrete values Second identify SOM elements adequate organisers instances task hand We visualise effects argue lead intelligible neural networks employed basis automatic rule extraction
The discrimination powers Multilayer perceptron MLP Learning Vector Quantisation LVQ networks compared overlapping Gaussian distributions It shown analytically Monte Carlo studies MLP network handles high dimensional problems efficient way LVQ This mainly due sigmoidal form MLP transfer function also fact MLP uses hyperplanes efficiently Both algorithms equally robust limited training sets learning curves fall like M M training set size compared theoretical predictions statistical estimates VapnikChervonenkis bounds
In article approximate rate convergence Gibbs sampler normal approximation target distribution Based approximation consider many implementational issues Gibbs sampler eg updating strategy parameterization blocking We give theoretical results justify approximation illustrate methods number realistic examples
Instancebased learning methods explicitly remember data receive They usually training phase prediction time perform computation Then take query search database similar datapoints build online local model local average local regression predict output value In paper review advantages instance based methods autonomous systems also note ensuing cost hopelessly slow computation database grows large We present evaluate new way structuring database new algorithm accessing maintains advantages instancebased learning Earlier attempts combat cost instancebased learning sacrificed explicit retention data applicable instancebased predictions based small number near neighbors reintroduce explicit training phase form interpolative data structure Our approach builds multiresolution data structure summarize database experiences resolutions interest simultaneously This permits us query database exibility conventional linear search greatly reduced computational cost
We implemented reinforcement learning architecture reactive component two layer control system simulated race car We found separating layers expedited gradually improving competition multagent interaction We ran experiments test tuning decomposition coordination low level behaviors We extended control system allow passing cars tested ability avoid collisions The best design used reinforcement learning separate networks behavior coarse coded input simple rule based coordination mechanism
This study concerned whether possible detect information contained training data background knowledge relevant solving learning problem whether irrelevant information eliminated preprocessing starting learning process A case study data preprocessing hybrid genetic algorithm shows elimination irrelevant features substantially improve efficiency learning In addition costsensitive feature elimination effective reducing costs induced hypotheses
Hierarchical genetic programming HGP approaches rely discovery modification use new functions accelerate evolution This paper provides qualitative explanation improved behavior HGP based analysis evolution process dual perspective diversity causality From static point view use HGP approach enables manipulation population higher diversity programs Higher diversity increases exploratory ability genetic search process demonstrated theoretical experimental fitness distributions expanded structural complexity individuals From dynamic point view analysis causality crossover operator suggests HGP discovers exploits useful structures bottomup hierarchical manner Diversity causality complementary affecting exploration exploitation genetic search Unlike machine learning techniques need extra machinery control tradeoff HGP automatically trades exploration exploitation
Previous neural network learning algorithms sequence processing computationally expensive perform poorly comes long time lags This paper first introduces simple principle reducing descriptions event sequences without loss information A consequence principle unexpected inputs relevant This insight leads construction neural architectures learn divide conquer recursively decomposing sequences I describe two architectures The first functions selforganizing multilevel hierarchy recurrent networks The second involving two recurrent networks tries collapse multilevel predictor hierarchy single recurrent net Experiments show system require less computation per time step many fewer training sequences conventional training algorithms recurrent nets
A neural network model selforganization ocular dominance lateral connections binocular input presented The selforganizing process results network afferent weights neuron organize smooth hillshaped receptive fields primarily one retinas neurons common eye preference form connected intertwined patches lateral connections primarily link regions eye preference Similar selforganization cortical structures observed experimentally strabismic kittens The model shows patterned lateral connections cortex may develop based correlated activity explains lateral connection patterns follow receptive field properties ocular dominance
Relaxation oscillations exhibiting one time scale arise naturally many physical systems This paper proposes method numerically integrate large systems relaxation oscillators The numerical technique called singular limit method derived analysis relaxation oscillations singular limit In limit system evolution gives rise time instants fast dynamics takes place intervals slow dynamics takes place A full description method given LEGION locally excitatory globally inhibitory oscillator networks fast dynamics characterized jumping leads dramatic phase shifts captured method iterative operation slow dynamics entirely solved The singular limit method evaluated computer experiments produces remarkable speedup compared methods integrating systems The speedup makes possible simulate largescale oscillator networks
A selforganizing model spiking neurons dynamic thresholds lateral excitatory inhibitory connections presented tested image segmentation task The model integrates two previously separate lines research modeling visual cortex Laterally connected selforganizing maps used model afferent structures lateral connections could selforganize inputdriven Hebbian adaptation Spiking neurons leaky integrator synapses used model image segmentation binding synchronization desynchronization neuronal activity Although approaches differ model neuron overall layout laterally connected twodimensional network This paper shows selforganization segmentation achieved network thus presenting unified model development func tional dynamics primary visual cortex
The full Bayesian method applying neural networks prediction problem set priorhyperprior structure net perform necessary integrals However integrals tractable analytically Markov Chain Monte Carlo MCMC methods slow especially parameter space highdimensional Using Gaussian processes approximate weight space integral analytically small number hyperparameters need integrated MCMC methods We applied idea classification problems obtaining ex cellent results realworld problems investigated far
Recently Propp Wilson proposed algorithm called Coupling Past CFTP allows approximate perfect ie exact simulation stationary distribution certain finite state space Markov chains Perfect Sampling using CFTP successfully extended context point processes amongst authors Haggstrom et al In Gibbs sampling applied bivariate point process penetrable spheres mixture model However general running time CFTP terms number transitions independent state sampled Thus impatient user aborts long runs may introduce subtle bias user impatience bias Fill introduced exact sampling algorithm finite state space Markov chains contrast CFTP unbiased user impatience Fills algorithm form rejection sampling similar CFTP requires sufficient monotonicity properties transition kernel used We show Fills version rejection sampling extended infinite state space context produce exact sample penetrable spheres mixture process related models Following use Gibbs sampling make use partial order mixture model state space Thus
Cells visual cortex selective ocular dominance orientation input also size spatial frequency The simulations reported paper show size selectivity could develop Hebbian selforganization receptive fields different sizes could organize columns like orientation ocular dominance The lateral connections network selforganize cooperatively simultaneously receptive field sizes produce patterns lateral connectivity closely follow receptive field organization Together previous work ocular dominance orientation selectivity results suggest single Hebbian selforganizing process give rise major receptive field properties visual cortex also structured patterns lateral interactions verified experimentally others predicted model The model also suggests functional role selforganized structures The afferent receptive fields develop sparse coding visual input recurrent lateral interactions eliminate redundancies cortical activity patterns allowing cortex efficiently process massive amounts visual information
A pilot study described practical application artificial neural networks The limit cycle attitude control satellite selected test case One sources limit cycle position dependent error observed attitude A Reinforcement Learning method selected able adapt controller cost function optimised An estimate cost function learned neural critic In approach estimated cost function directly represented function parameters linear controller The critic implemented CMAC network Results simulations show method able find optimal parameters without unstable behaviour In particular case large discontinuities attitude measurements method shows clear improvement compared conventional approach RMS attitude error decreases approximately
In nutshell describe generic ILP problem following given set E positive negative examples target predicate background knowledge B world usually logic program including facts auxiliary predicates task find logic program H hypothesis positive examples deduced B H negative example In paper review results achieved area discuss techniques used Moreover prove following new results Predicates described nonrecursive local clauses k literals PAClearnable distribution This generalizes previous result valid constrained clauses Predicates described k nonrecursive local clauses PAClearnable distribution This generalizes previous result non construc tive valid class distributions Finally introduce believe first theoretical framework learning Prolog clauses presence errors To purpose introduce new noise model call fixed attribute noise model learning propositional concepts Boolean domain This new noise model interest
The ExpectationMaximization algorithm given Dempster et al enjoyed considerable popularity solving MAP estimation problems This note gives simple derivation algorithm due Luttrell better illustrates convergence properties algorithm variants The algorithm illustrated two examples pooling data multiple noisy sources fitting mixture density
We consider problem incorporate prior knowledge supervised learning techniques We set problem framework regularization theory consider case know approximated function radial symmetry The problem solved two alternative ways use invariance constraint regularization theory framework derive rotation invariant version Radial Basis Functions use radial symmetry create new virtual examples given data set We show two apparently different methods learning
I present computational results suggesting gainadaptation algorithms based part connectionist learning methods may improve least squares classical parameterestimation methods stochastic timevarying linear systems The new algorithms evaluated respect classical methods along three dimensions asymptotic error computational complexity required prior knowledge system The new algorithms order complexity LMS methods On n dimensionality system whereas leastsquares methods Kalman filter On The new methods also improve Kalman filter require complete statistical model system varies time In simple computational experiment new methods shown produce asymptotic error levels near optimal Kalman filter significantly leastsquares LMS methods The new methods may perform better even Kalman filter error filters model system varies time
Windowing proposed procedure efficient memory use ID decision tree learning algorithm However previous work shown windowing may often lead decrease performance In work try argue separateandconquer rule learning algorithms appropriate windowing divideandconquer algorithms learn rules independently less susceptible changes class distributions In particular present new windowing algorithm achieves additional gains efficiency exploiting property separateandconquer algorithms While presented algorithm suitable redundant noisefree data sets also briefly discuss problem noisy data windowing present preliminary ideas might solved extension algorithm introduced paper
This article describes comprehensive approach automatic theory revision Given imperfect theory approach combines explanation attempts incorrectly classified examples order identify failing portions theory For theory fault correlated subsets examples used inductively generate correction Because corrections focused tend preserve structure original theory Because system starts approximate domain theory general fewer training examples required attain given level performance classification accuracy compared purely empirical system The approach applies classification systems employing propositional Hornclause theory The system tested variety application domains results presented problems domains molecular biology plant disease diagnosis
Suppose one wishes sample density x using Markov chain Monte Carlo MCMC An auxiliary variable u conditional distribution ujx defined giving joint distribution x u xujx A MCMC scheme samples joint distribution lead substantial gains efficiency compared standard approaches The revolutionary algorithm Swendsen Wang one example In addition reviewing SwendsenWang algorithm generalizations paper introduces new auxiliary variable method called partial decoupling Two applications Bayesian image analysis considered The first binary classification problem partial decoupling performs SW single site Metropolis The second PET reconstruction uses gray level prior Geman McClure A generalized SwendsenWang algorithm developed problem reduces computing time point MCMC viable method posterior exploration
In paper analyse theoretical properties slice sampler We find algorithm extremely robust geometric ergodicity properties For case one auxiliary variable demonstrate algorithm stochastically monotone deduce analytic bounds total variation distance stationarity method using FosterLyapunov drift condition methodology
This paper studies well combination simulated annealing ADFs solves genetic programming GP style program discovery problems On suite composed evenkparity problems k analyses performance simulated annealing ADFs compared using ADFs In contrast GP results suite simulated annealing run ADFs problem size increases advantage using standard GP program representation marginal When performance simulated annealing compared GP algorithm using ADFs evenparity problem GP advantageous evenparity problem SA GP equal evenparity problem SA advantageous
Most learning algorithms work effectively training data contain completely specified labeled samples In many diagnostic tasks however data include values attributes model blocking process hides values attributes learner While blockers remove values critical attributes handicap learner paper instead focuses blockers remove irrelevant attribute values ie values needed classify instance given values unblocked attributes We first motivate formalize model superfluousvalue blocking demonstrate omissions useful proving certain classes seem hard learn general PAC model viz decision trees DNF formulae trivial learn setting We also show model extended deal theory revision ie modifying existing formula blockers occasionally include superfluous values exclude required values cor ruptions training data
This paper presents approach automatic discovery functions Genetic Programming The approach based discovery useful building blocks analyzing evolution trace generalizing blocks define new functions finally adapting problem representation onthefly Adaptating representation determines hierarchical organization extended function set enables restructuring search space solutions found easily Measures complexity solution trees defined adaptive representation framework The minimum description length principle applied justify feasibility approaches based hierarchy discovered functions suggest alternative ways defining problems fitness function Preliminary empirical results presented
A decision problem associated fundamental nonconvex model linearly inseparable pattern sets shown NPcomplete Another nonconvex model employs norm instead norm solved polynomial time solving n linear programs n usually small dimensionality pattern space An effective LPbased finite algorithm proposed solving latter model The algorithm employed obtain nonconvex piecewiselinear function separating points representing measurements made fine needle aspirates taken benign malignant human breasts A computer program trained samples correctly diagnosed new samples encountered currently use University Wisconsin Hospitals Introduction The fundamental problem wish address
Many connectionist approaches musical expectancy music composition let question What next overshadow equally important question When next One escape latter question one temporal structure considering perception musical meter We view perception metrical structure dynamic process temporal organization external musical events synchronizes entrains listeners internal processing mechanisms This article introduces novel connectionist unit based upon mathematical model entrainment capable phase frequencylocking periodic components incoming rhythmic patterns Networks units selforganize temporally structured responses rhythmic patterns The resulting network behavior embodies perception metrical structure The article concludes discussion implications approach theories metrical structure musical expectancy
Over years several packages developed provide workbench genetic algorithm GA research Most packages use generational model inspired GENESIS A adopted steadystate model used Genitor Unfortunately deficiencies working orderbased problems packing routing scheduling This paper describes LibGA developed specifically orderbased problems also works easily kinds problems It offers easy use userfriendly interface allows comparisons made generational steadystate genetic algorithms particular problem It includes variety genetic operators reproduction crossover mutation LibGA makes easy use operators new ways particular applications develop include new operators Finally offers unique new feature dynamic generation gap
Human episodic memory provides seemingly unlimited storage everyday experiences retrieval system allows us access experiences partial activation components The system believed consist fast temporary storage hippocampus slow longterm storage within neocortex This paper presents neural network model hippocampal episodic memory inspired Damasios idea Convergence Zones The model consists layer perceptual feature maps binding layer A perceptual feature pattern coarse coded binding layer stored weights layers A partial activation stored features activates binding pattern turn reactivates entire stored pattern For many configurations model theoretical lower bound memory capacity derived order magnitude higher number units model several orders magnitude higher number bindinglayer units Computational simulations indicate average capacity order magnitude larger theoretical lower bound making connectivity layers sparser causes even increase capacity Simulations also show descriptive binding patterns used errors tend plausible patterns confused similar patterns slight cost capacity The convergencezone episodic memory therefore accounts immediate storage associative retrieval capability large capacity hippocampal memory shows memory encoding areas much smaller perceptual maps consist rather coarse computational units sparsely connected perceptual maps
Empirical Learning Results POLLYANNA The value empirical learning demonstrated results testing theory space search TSS component POLLYANNA Empirical data shows approximations generated generic simplifying assumptions widely varying levels accuracy efficiency The candidate theory space includes theories Pareto optimal combinations accuracy efficiency well others nonoptimal Empirical learning thus needed separate optimal theories nonoptimal ones It works filter process generating approximations generic simplifying assumptions Empirical tests serve additional purpose well Theory space search collects data precisely characterizes tradeoff accuracy efficiency among candidate approximate theories The tradeoff data used select theory best balances competing objectives accuracy efficiency manner appropriate intended performance context The feasibility empirical learning also addressed results testing theory space search component POLLYANNA In order empirical testing feasible candidate approximate theories must operationally usable Candidate hearts theories generated POLLYANNA shown operationally usable experimental results theory space search TSS phase learning They run real machine producing results compared training examples Feasibility also depends information computation costs empirical testing Information costs result need supply system training examples Computation costs result need execute candidate theories Both types costs grow numbers candidate theories tested Experimental results show empirical testing POLLYANNA limited computation costs executing candidate theories information costs obtaining many training examples POLLYANNA contrasts respect traditional inductive learning systems The feasibility empirical learning depends also intended performance context resources available context learning Measurements theory space search phase indicate TSS algorithms performing exhaustive search would feasible hearts domain although may feasible applications TSS algorithms avoid exhaustive search hold considerably promise
In paper adopt generalsum stochastic games framework multiagent reinforcement learning Our work extends previous work Littman zerosum stochastic games broader framework We design multiagent Qlearning method framework prove converges Nash equilibrium specified conditions This algorithm useful finding optimal strategy exists unique Nash equilibrium game When exist multiple Nash equilibria game algorithm combined learning techniques find optimal strategies
Rising operating costs structural transformations resizing globalization companies world brought focus emerging discipline knowledge management concerned making knowledge pay Corporate memories form important part knowledge management initiatives company In paper discuss viewing corporate memories distributed case libraries benefit existing techniques distributed casebased reasoning resource discovery exploitation previous expertise We present two techniques developed context multiagent casebased reasoning accessing exploiting past experience corporate memory resources The first approach called Negotiated Retrieval deals retrieving assembling case pieces different resources corporate memory form good overall case The second approach based Federated Peer Learning deals two modes cooperation called DistCBR ColCBR let agent exploit experience expertise peer agents achieve local task fl The first author would like acknowledge support National Science Foundation Grant Nos IRI EEC The second authors research reported paper developed IIIA inside ANALOG Project funded Spanish CICYT grant The content paper necessarily reflect position policy US Government Kingdom Spain Government Catalonia Government official endorsement inferred
When learning reasoning failures knowledge system behaves powerful lever deciding went wrong system deciding system needs learn A number benefits arise systems possess knowledge operation knowledge Abstract knowledge cognition used select diagnosis repair strategies among alternatives Specific kinds selfknowledge used distinguish failure hypothesis candidates Making selfknowledge explicit also facilitate use knowledge across domains provide principled way incorporate new learning strategies To illustrate advantages selfknowledge learning provide implemented examples two different systems A plan execution system called RAPTER story understanding system called MetaAQUA
Theory revision integrates inductive learning background knowledge combining training examples coarse domain theory produce accurate theory There two challenges theory revision theoryguided systems face First representation language appropriate initial theory may inappropriate improved theory While original representation may concisely express initial theory accurate theory forced use representation may bulky cumbersome difficult reach Second theory structure suitable coarse domain theory may insufficient finetuned theory Systems produce small local changes theory limited value accomplishing complex structural alterations may required Consequently advanced theoryguided learning systems require flexible representation flexible structure An analysis various theory revision systems theoryguided learning systems reveals specific strengths weaknesses terms two desired properties Designed capture underlying qualities system new system uses theoryguided constructive induction Experiments three domains show improvement previous theoryguided systems This leads study behavior limitations potential theoryguided constructive induction
If experiment requires statistical analysis establish result one better experiment Ernest Rutherford Most proponents cold fusion reporting excess heat electrolysis experiments claiming one main characteristics cold fusion irreproducibility JR Huizenga Cold Fusion p Abstract Amid ever increasing research various aspects neural computing much progress evident theoretical advances empirical studies On empirical side wealth data experimental studies reported It however clear best report neural computing experiments may replicated interested researchers In particular nature iterative learning randomised initial architecture backpropagation training multilayer perceptron precise replication reported result virtually impossible The outcome experimental replication reported results touchstone scientific method option researchers popular subfield neural computing In paper address issue replicability experiments based backpropagation training multilayer perceptrons although many results applicable subfield plagued characteristics First attempt produce complete abstract specification neural computing experiment From specification identify full range parameters needed support maximum replicability use show absolute replicability option practice We propose statistical framework support replicability We demonstrate framework empirical studies replicability respect experimental controls validity implementations backpropagation algorithm Finally suggest degree replicability neural computing experiment estimated reflected claimed precision empirical results reported
In paper propose unsupervised neural network allowing robot learn sensorimotor associations delayed reward The robot task learn meaning pictograms order survive maze First introduce new neural conditioning rule PCR Probabilistic Conditioning Rule allowing test hypotheses associations visual categories movements given time span Second describe real maze experiment mobile robot We propose neural architecture solve problem discuss difficulty build visual categories dynamically associating movements Third propose use algorithm simulation order test exhaustively We give results different kind mazes compare system adapted version Qlearning algorithm Finally conclude showing limitations approaches take account intrinsic complexity reasonning based image recognition
We present framework analysis synthesis acoustical instruments based datadriven probabilistic inference modeling Audio time series boundary conditions played instrument recorded nonlinear mapping control data audio space inferred using general inference framework ClusterWeighted Modeling The resulting model used realtime synthesis audio sequences new input data
In many realworld domains task machine learning algorithms learn theory predicting numerical values In particular several standard test domains used Inductive Logic Programming ILP concerned predicting numerical values examples relational mostly nondeterminate background knowledge However far ILP algorithm except one predict numbers cope nondeterminate background knowledge The exception covering algorithm called FORS In paper present Structural Regression Trees SRT new algorithm applied class problems integrating statistical method regression trees ILP SRT constructs tree containing literal atomic formula negation conjunction literals node assigns numerical value leaf SRT provides comprehensible results purely statistical methods applied class problems ILP systems handle Experiments several realworld domains demonstrate approach competitive existing methods indicating advantages expense predictive accuracy
A quantitative practical Bayesian framework described learning mappings feedforward networks The framework makes possible objective comparisons solutions using alternative network architectures objective stopping rules network pruning growing procedures objective choice magnitude type weight decay terms additive regularisers penalising large weights etc measure effective number welldetermined parameters model quantified estimates error bars network parameters network output objective comparisons alternative learning interpolation models splines radial basis functions The Bayesian evidence automatically embodies Occams razor penalising overflexible overcomplex models The Bayesian approach helps detect poor underlying assumptions learning models For learning models well matched problem good correlation generalisation ability This paper makes use Bayesian framework regularisation model comparison described companion paper Bayesian interpolation MacKay This framework due Gull Skilling Gull Bayesian evidence obtained
Simultaneous multithreading technique permits multiple independent threads issue multiple instructions cycle In previous work demonstrated performance potential simultaneous multithreading based somewhat idealized model In paper show throughput gains simultaneous multithreading achieved without extensive changes conventional wideissue superscalar either hardware structures sizes We present architecture simultaneous multithreading achieves three goals minimizes architectural impact conventional superscalar design minimal performance impact single thread executing alone achieves significant throughput gains running multiple threads Our simultaneous multithreading architecture achieves throughput instructions per cycle fold improvement unmodified superscalar similar hardware resources This speedup enhanced advantage multithreading previously unexploited architectures ability favor fetch issue threads efficiently using processor cycle thereby providing best instructions processor
The theory revision problem problem best go revising deficient domain theory using information contained examples expose inaccuracies In paper present approach theory revision problem propositional domain theories The approach described called PTR uses probabilities associated domain theory elements numerically track ow proof theory This allows us measure precise role clause literal allowing preventing desired undesired derivation given example This information used efficiently locate repair awed elements theory PTR proved converge theory correctly classifies examples shown experimentally fast accurate even deep theories
New methodology fully Bayesian mixture analysis developed making use reversible jump Markov chain Monte Carlo methods capable jumping parameter subspaces corresponding different numbers components mixture A sample full joint distribution unknown variables thereby generated used basis thorough presentation many aspects posterior distribution The methodology applied analysis univariate normal mixtures using hierarchical prior model offers approach dealing weak prior information avoiding mathematical pitfalls using improper priors mixture context
Northeastern University College Computer Science Technical Report NUCCS fl We gratefully acknowledge substantial contributions effort provided Andy Barto sparked original interest questions whose continued encouragement insightful comments criticisms helped us greatly Recent discussions Satinder Singh Vijay Gullapalli also helpful impact work Special thanks also Rich Sutton influenced thinking subject numerous ways This work supported Grant IRI National Science Foundation U S Air Force
Active learning differs passive learning examples learning algorithm assumes least control part input domain receives information In situations active learning provably powerful learning examples alone giving better generalization fixed number training examples In paper consider problem learning binary concept absence noise Valiant We describe formalism active concept learning called selective sampling show may approximately implemented neural network In selective sampling learner receives distribution information environment queries oracle parts domain considers useful We test implementation called SGnetwork three domains observe significant improvement generalization
High performance architectures always deal performancelimiting impact branch operations Microprocessor designs going deal problem well move towards deeper pipelines support multiple instruction issue Branch prediction schemes often used alleviate negative impact branch operations allowing speculative execution instructions unresolved branch Another technique eliminate branch instructions altogether Predication remove forward branch instructions translating instructions following branch predicate form This paper analyzes variety existing predication models eliminating branch operations effect elimination branch prediction schemes existing processors including single issue architectures simple prediction mechanisms newer multiissue designs correspondingly sophisticated branch predictors The effect branch prediction accuracy branch penalty basic block size studied hhhhhhhhhhhhhhhhhhhhhhhh
This paper describes model complementarity rules precedents classification task Under model precedents assist rulebased reasoning operationalizing abstract rule antecedents Conversely rules assist casebased reasoning case elaboration process inferring case facts order increase similarity cases term reformulation process replacing term whose precedents weakly match case terms whose precedents strongly match case Fully exploiting complementarity requires control strategy characterized impartiality absence arbitrary ordering restrictions use rules precedents An impartial control strategy implemented GREBE domain Texas workers compensation law In preliminary evaluation GREBEs performance found good slightly better performance law students task A case classified belonging particular category relating description criteria category membership The justifications warrants Toulmin relate case category vary widely generality antecedents For example consider warrants classifying case legal category negligence A rule An action negligent actor fails use reasonable care failure proximate cause injury general antecedent terms eg breach reasonable care Conversely precedent Dr Jones negligent failed count sponges surgery result left sponge Smith specific antecedent terms eg failure count sponges Both types warrants used classification systems relate cases categories Classification systems used precedents help match antecedents rules cases Completing match difficult terms antecedent opentextured ie significant uncertainty whether match specific facts Gardner McCarty Sridharan This problem results generality gap separating abstract terms specific facts Porter et al Precedents opentextured term ie past cases term applied used bridge gap Unlike rule antecedents antecedents precedents level generality cases generality gap exists precedents new cases Precedents therefore reduce problem matching specific case facts opentextured terms problem matching two sets specific facts For example injured employees entitlement workers compensation depends whether injured activity furtherance employment Determining whether particular case classified compensable injury therefore requires matching specific facts case eg John injured automobile accident driving office opentextured term activity furtherance employment The gap generality case description abstract term makes match problematical However completing match may much easier precedents term activity furtherance employment eg Marys injury compensable occurred driving work activity furtherance employment Bills injury compensable occurred driving house deliver pizza activity furtherance employment In case Johns driving office closely matches Marys driving work
We introduce modelbased average reward Reinforcement Learning method called Hlearning compare discounted counterpart Adaptive RealTime Dynamic Programming simulated robot scheduling task We also introduce extension Hlearning automatically explores unexplored parts state space always choosing greedy actions respect current value function We show Autoexploratory Hlearning performs better original Hlearning previously studied exploration methods random recencybased counterbased exploration
This paper proposes using fuzzy logic techniques dynamically control parameter settings genetic algorithms GAs We describe Dynamic Parametric GA GA uses fuzzy knowledgebased system control GA parameters We introduce technique automatically designing tuning fuzzy knowledgebase system using GAs Results initial experiments show performance improvement simple static GA One Dynamic Parametric GA system designed automatic method demonstrated improvement application included design phase may indicate general applicability Dynamic Parametric GA wide range ap plications
In previous work Olshausen Field algorithm described learning linear sparse codes trained natural images produces set basis functions spatially localized oriented bandpass ie waveletlike This note shows algorithm may interpreted within maximumlikelihood framework Several useful insights emerge connection makes explicit relation statistical independence ie factorial coding shows formal relationship algorithm Bell Sejnowski suggests adapt parameters previously fixed This report describes research done within Center Biological Computational Learning Department Brain Cognitive Sciences Massachusetts Institute Technology This research sponsored Individual National Research Service Award BAO NIMH FMH grant National Science Foundation contract ASC award includes funds ARPA provided HPCC program CBCL
We study layered belief networks binary random variables conditional probabilities Prchildjparents depend monotonically weighted sums parents For networks give efficient algorithms computing rigorous bounds marginal probabilities evidence output layer Our methods apply generally computation upper lower bounds well generic transfer function parameterizations conditional probability tables sigmoid noisyOR We also prove rates convergence accuracy bounds function network size Our results derived applying theory large deviations weighted sums parents node network Bounds marginal probabilities computed two contributions one assuming weighted sums fall near mean values assuming This gives rise interesting tradeoff probable explanations evidence improbable deviations mean In networks child N parents gap upper lower bounds behaves sum two terms one order p In addition providing rates convergence large networks methods also yield efficient algorithms approximate inference fixed networks
Feature selection proven valuable technique supervised learning improving predictive accuracy reducing number attributes considered task We investigate potential similar benefits unsupervised learning task conceptual clustering The issues raised feature selection absence class labels discussed implementation sequential feature selection algorithm based existing conceptual clustering system described Additionally present second implementation employs technique improving efficiency search optimal description compare performance algorithms
Robust flexible sufficiently general vision systems recognition description complex dimensional objects require adequate armamentarium representations learning mechanisms This paper briefly analyzes strengths weaknesses different learning paradigms symbol processing systems connectionist networks statistical syntactic pattern recognition systems possible candidates providing capabilities points several promising directions integrating multiple paradigms synergistic fashion towards goal
A selforganizing neural network sequence classification called SARDNET described analyzed experimentally SARDNET extends Kohonen Feature Map architecture activation retention decay order create unique distributed response patterns different sequences SARDNET yields extremely dense yet descriptive representations sequential input training iterations The network proven successful mapping arbitrary sequences binary real numbers well phonemic representations English words Potential applications include isolated spoken word recognition cognitive science models sequence processing
LIACC Technical Report Abstract In paper address problem acquiring knowledge integration Our aim construct integrated knowledge base several separate sources The objective integration construct one system exploits knowledge available good performance The aim paper discuss methodology knowledge integration present concrete results In experiments performance integrated theory exceeded performance individual theories quite significant amount Also performance fluctuate much experiments repeated These results indicate knowledge integration complement existing ML methods
In introduction define term bias used machine learning systems We motivate importance automated methods evaluating selecting biases using framework bias selection search bias metabias spaces Recent research field machine learning bias summarized
A method initial results comparative study ABSTRACT A standard approach determining decision trees learn examples A disadvantage approach decision tree learned difficult modify suit different decision making situations Such problems arise example attribute assigned node measured significant change costs measuring attributes frequency distribution events different decision classes An attractive approach resolving problem learn store knowledge form decision rules generate whenever needed decision tree suitable given situation An additional advantage approach facilitates building compact decision trees much simpler logically equivalent conventional decision trees compact trees meant decision trees may contain branches assigned set values nodes assigned derived attributes ie attributes logical mathematical functions original ones The paper describes efficient method AQDT takes decision rules generated AQtype learning system AQ AQ builds decision tree optimizing given optimality criterion The method work two modes standard mode produces conventional decision trees compact mode produces compact decision trees The preliminary experiments AQDT shown decision trees generated decision rules conventional compact outperformed generated examples wellknown C program terms simplicity predictive accuracy
OGI CSE Technical Report Abstract Smoothing regularizers radial basis functions studied extensively general smoothing regularizers projective basis functions PBFs widelyused sigmoidal PBFs heretofore proposed We derive new classes algebraicallysimple th order smoothing regularizers networks projective basis functions f W x P N fi fl u general transfer functions g These simple algebraic forms RW enable direct enforcement smoothness without need costly Monte Carlo integrations SW The regularizers tested illustrative sample problems compared quadratic weight decay The new regularizers shown yield better generalization errors
We address problem musical variation identification different musical sequences variations implications mental representations music According reductionist theories listeners judge structural importance musical events forming mental representations These judgments may result production reduced memory representations retain musical gist In study improvised music performance pianists produced variations melodies Analyses musical events retained across variations provided support reductionist account structural importance A neural network trained produce reduced memory representations melodies represented structurally important events efficiently others Agreement among musicians improvisations network model musictheoretic predictions suggest perceived constancy across musical variation natural result reductionist mechanism producing memory representations
Ensemble learning variational free energy minimization tool introduced neural networks Hinton van Camp learning described terms optimization ensemble parameter vectors The optimized ensemble approximation posterior probability distribution parameters This tool applied variety statistical inference problems In paper I study linear regression model parameters hyperparameters I demonstrate evidence approximation optimization regularization constants derived detail free energy minimization view point
The self regenerative MCMC tool constructing Markov chain given stationary distribution constructing auxiliary chain stationary distribution Elements auxiliary chain picked suitable random number times resulting chain stationary distribution Sahu Zhigljavsky In article provide generic adaptation scheme algorithm The adaptive scheme use knowledge stationary distribution gathered far update course simulation This method easy implement often leads considerable improvement We obtain theoretical results adaptive scheme Our proposed methodology illustrated number realistic examples Bayesian computation performance compared available MCMC techniques In one applications develop nonlinear dynamics model modeling predatorprey relationships wild
Conceptual analogy CA approach integrates conceptualization ie memory organization based prior experiences analogical reasoning Borner It implemented prototypically tested support design process building engineering Borner Janetzko Borner There number features distinguish CA standard approaches CBR AR First CA automatically extracts knowledge needed support design tasks ie complex case representations relevance object features relations proper adaptations attributevalue representations prior layouts Secondly effectively determines similarity complex case representations terms adaptability Thirdly implemented integrated highly interactive adaptive system architecture allows incremental knowledge acquisition user support This paper surveys basic assumptions psychological results influenced development CA It sketches knowledge representation formalisms employed characterizes subprocesses needed integrate memory organization analogical reasoning
Even sophisticated branchprediction techniques necessarily suffer mispredictions even relatively small mispredict rates hurt performance substantially currentgeneration processors In paper investigate schemes improving performance face imperfect branch predictors processor simultaneously execute code taken nottaken outcomes branch This paper presents data regarding limits multipath execution considers fetchbandwidth needs multipath execution discusses various dynamic confidenceprediction schemes gauge likelihood branch mispredictions Our evaluations consider executing along several paths Using paths relatively simple confidence predictor multipath execution garners speedups compared singlepath case average speedup SPECint suite While associated increases instructionfetchbandwidth requirements surprising less expected result significance separate returnaddress stack forked path Overall results indicate multipath execution offers significant improvements singlepath performance could especially useful combined multithreading hardware costs amortized approaches
This paper presents algorithms robustness analysis Bayesian networks global neighborhoods Robust Bayesian inference calculation bounds posterior values given perturbations probabilistic model We present algorithms robust inference including expected utility expected value variance bounds global perturbations modeled contaminated constant density ratio constant density bounded total variation classes distributions c fl Carnegie Mellon University
This paper describes selflearning control system mobile robot Based local sensor data robot taught avoid collisions obstacles The feedback control system binaryvalued external reinforcement signal indicates whether collision occured A reinforcement learning scheme used find correct mapping input sensor space output steering signal space An adaptive quantisation scheme introduced discrete division input space built scratch system
Rules extracted trained feedforward networks used explanation validation crossreferencing network output decisions This paper introduces rule evaluation ordering mechanism orders rules extracted feedforward networks based three performance measures Detailed experiments using three rule extraction techniques applied Wisconsin breast cancer database illustrate power proposed methods Moreover method integrating output decisions extracted rulebased system corresponding trained network proposed The integrated system provides improvements
Standard methods inducing structure weight values recurrent neural networks fit assumed class architectures every task This simplification necessary interactions network structure function well understood Evolutionary computation includes genetic algorithms evolutionary programming populationbased search method shown promise complex tasks This paper argues genetic algorithms inappropriate network acquisition describes evolutionary program called GNARL simultaneously acquires structure weights recurrent networks This algorithms empirical acquisition method allows emergence complex behaviors topologies potentially excluded artificial architectural constraints imposed standard network induction methods
A new mechanism genetic encoding neural networks proposed loosely based marker structure biological DNA The mechanism allows aspects network structure including number nodes connectivity evolved genetic algorithms The effectiveness encoding scheme demonstrated object recognition task requires artificial creatures whose behaviour driven neural network develop highlevel finitestate exploration discrimination strategies The task requires solving sensorymotor grounding problem ie developing functional understanding effects creatures movement sensory input
We study multivariate smoothing spline estimate function several variables based ANOVA decomposition sums main effect functions one variable twofactor interaction functions two variables etc We derive Bayesian confidence intervals components decomposition demonstrate even multiple smoothing parameters efficiently computed using publicly available code RKPACK originally designed compute estimates We carry small Monte Carlo study see closely actual properties componentwise confidence intervals match nominal confidence levels Lastly analyze lake acidity data function calcium concentration latitude longitude using polynomial thin plate spline main effects model
appear Cowan J Tesauro G Alspector J eds Advances Neural Information Processing Systems San Francisco CA Morgan Kaufmann Publishers This paper written tersely accomodate page limitation Presented refereed poster session Conference Neural Information Processing SystemsNatural Synthetic November Denver CO Supported NSF grants DMS DMS National Eye InstituteNIH grants EY EY
Virtually largescale sequencing projects use automatic sequenceassembly programs aid determination DNA sequences The computergenerated assemblies require substantial handediting transform submissions GenBank As size sequencing projects increases becomes essential improve quality automated assemblies timeconsuming handediting may reduced Current ABI sequencing technology uses base calls made fluorescentlylabeled DNA fragments run gels We present new representation fluorescent trace data associated individual base calls This representation used fragment assembly improve quality assemblies We demonstrate one use endtrimming suboptimal data results significant improvement quality subsequent assemblies
Extensive research done extracting parallelism single instruction stream processors This paper presents results investigation ways modify MIMD architectures allow extract instruction level parallelism achieved current superscalar VLIW machines A new architecture proposed utilizes advantages multiple instruction stream design addressing limitations prevented MIMD architectures performing ILP operation A new code scheduling mechanism described support new architecture partitioning instructions across multiple processing elements order exploit level parallelism
This paper describes single chip Multiple Instruction Stream Computer MISC capable extracting instruction level parallelism broad spectrum programs The MISC architecture uses multiple asynchronous processing elements separate program streams executed parallel integrates conflictfree message passing system lowest level processor design facilitate low latency intraMISC communication This approach allows increased machine parallelism minimal code expansion provides alternative approach single instruction stream multiissue machines SuperScalar VLIW
In paper define examine two versions bridge problem The first variant bridge problem determistic model agent knows superset transitions priori probabilities transitions intact In second variant transitions break fixed probability time step These problems applicable planning uncertain domains well packet routing computer network We show agent act optimally models reduction Markov decision processes We describe methods solving note methods intractable reasonably sized problems Finally suggest neurodynamic programming method value function approximation types models
If several mental states reliably distinguished recognizing patterns EEG paralyzed person could communicate device like wheelchair composing sequencesof mental states In article report study comparing four representations EEG signals classification twolayer neural network sigmoid activation functions The neural network implemented CNAPS server processor SIMD architecture Adaptive Solutions Inc gaining fold decrease training time Sun
Recurrent perceptron classifiers generalize classical perceptron model They take account correlations dependences among input coordinates arise linear digital filtering This paper provides tight bounds sample complexity associated fitting models experimental data
I present general taxonomy neural net architectures processing timevarying patterns This taxonomy subsumes many existing architectures literature points several promising architectures yet examined Any architecture processes timevarying patterns requires two conceptually distinct components shortterm memory holds relevant past events associator uses shortterm memory classify predict My taxonomy based characterization shortterm memory models along dimensions form content adaptability Experiments predicting future values financial time series US dollarSwiss franc exchange rates presented using several alternative memory models The results experiments serve baseline sophisticated architectures compared Neural networks proven promising alternative traditional techniques nonlinear temporal prediction tasks eg Curtiss Brandemuehl Kreider Lapedes Farber Weigend Huberman Rumelhart However temporal prediction particularly challenging problem conventional neural net architectures algorithms well suited patterns vary time The prototypical use neural nets structural pattern recognition In task collection featuresvisual semantic otherwiseis presented network network must categorize input feature pattern belonging one classes For example network might trained classify animal species based set attributes describing living creatures tail lives water carnivorous network could trained recognize visual patterns twodimensional pixel array letter fA B Zg In tasks network presented relevant information simultaneously In contrast temporal pattern recognition involves processing patterns evolve time The appropriate response particular point time depends current input potentially previous inputs This illustrated Figure shows basic framework temporal prediction problem I assume time quantized discrete steps sensible assumption many time series interest intrinsically discrete continuous series sampled fixed interval The input time denoted xt For univariate series input
DISLEX artificial neural network model mental lexicon It built test computationally whether lexicon could consist separate feature maps different lexical modalities lexical semantics connected ordered pathways In model orthographic phonological semantic feature maps associations formed unsupervised process based cooccurrence lexical symbol meaning After model organized various damage lexical system simulated resulting dyslexic categoryspecific aphasic impairments similar observed human patients
In paper describe new selforganizing decomposition technique learning highdimensional mappings Problem decomposition performed errordriven manner resulting subtasks patches equally well approximated Our method combines unsupervised learning scheme Feature Maps Koh nonlinear approximator Backpropagation RHW The resulting learning system stable effective changing environments plain backpropagation much powerful extended feature maps proposed RS RMS Extensions method give rise active exploration strategies autonomous agents facing unknown environments The appropriateness general purpose method demonstrated ex ample mathematical function approximation
Irrelevant features weakly relevant features may reduce comprehensibility accuracy concepts induced supervised learning algorithms We formulate search feature subset abstract search problem probabilistic estimates Searching space using evaluation function random variable requires trading accuracy estimates increased state exploration We show recent feature subset selection algorithms machine learning literature fit search problem simple hill climbing approaches conduct small experiment using bestfirst search technique
As field Genetic Programming GP matures breadth application increases need parallel implementations becomes absolutely necessary The transputerbased system recently presented Koza one rare parallel implementations Until today implementation proposed parallel GP using SIMD architecture except dataparallel approach although others exploited workstation farms pipelined supercomputers One reason certainly apparent difficulty dealing parallel evaluation different Sexpressions single instruction executed time every processor The aim paper present implementation parallel GP SIMD system processor efficiently evaluate different Sexpression We implemented approach MasPar MP computer present timing results To extent SIMD machines like MasPar available offer costeffective cycles scien tific experimentation useful approach
Reinforcement learning problem generating optimal behavior sequential decisionmaking environment given opportunity interacting Many algorithms solving reinforcementlearning problems work computing improved estimates optimal value function We extend prior analyses reinforcementlearning algorithms present powerful new theorem provide unified analysis valuefunctionbased reinforcementlearning algorithms The usefulness theorem lies allows asynchronous convergence complex reinforcementlearning algorithm proven verifying simpler synchronous algorithm converges We illustrate application theorem analyzing convergence Qlearning modelbased reinforcement learning Qlearning multistate updates Qlearning Markov games risksensitive reinforcement learning
Hyperspectral image sensors provide images large number contiguous spectral channels per pixel enable information different materials within pixel obtained The problem spectrally unmixing materials may viewed specific case blind source separation problem data consists mixed signals case minerals goal determine contribution mineral mix without prior knowledge minerals mix The technique Independent Component Analysis ICA assumes spectral components close statistically independent provides unsupervised method blind source separation We introduce contextual ICA context hyperspectral data analysis apply method mineral data synthetically mixed minerals real image signatures
Partially observable Markov decision processes POMDPs allow one model complex dynamic decision control problems include action outcome uncertainty imperfect observability The control problem formulated dynamic optimization problem value function combining costs rewards multiple steps In paper propose analyse test various incremental methods computing bounds value function control problems infinite discounted horizon criteria The methods described tested include novel incremental versions gridbased linear interpolation method simple lower bound method Sondiks updates Both work arbitrary points belief space enhanced various heuristic point selection strategies Also introduced new method computing initial upper bound fast informed bound method This method able improve significantly standard commonly used upper bound computed MDPbased method The quality resulting bounds tested maze navigation problem states actions observations
The energy prediction competition involved prediction series building energy loads series environmental input variables Nonlinear regression using neural networks popular technique modeling tasks Since obvious large timewindow inputs appropriate preprocessing inputs best viewed regression problem many possible input variables may actually irrelevant prediction output variable Because finite data set show random correlations irrelevant inputs output conventional neural network even regularisation weight decay set coefficients junk inputs zero Thus irrelevant variables hurt models performance The Automatic Relevance Determination ARD model puts prior regression parameters embodies concept relevance This done simple soft way introducing multiple regularisation constants one associated input Using Bayesian methods regularisation constants junk inputs automatically inferred large preventing inputs causing significant overfitting
Several different approaches used describe concepts supervised learning tasks In paper describe two approaches prototypebased incremental neural networks casebased reasoning approaches We show improve prototypebased neural network model storing specific instances CBR memory system This leads us propose coprocessing hybrid model classification
Extensive research done extracting parallelism single instruction stream processors This paper presents investigation ways modify MIMD architectures allow extract instruction level parallelism achieved current superscalar VLIW machines A new architecture proposed utilizes advantages multiple instruction stream design addressing limitations prevented MIMD architectures performing ILP operation A new code scheduling mechanism described support new architecture partitioning instructions across multiple processing elements order exploit level parallelism
A performance prediction method presented indicating performance range MIMD parallel processor systems neural network simulations The total execution time parallel application modeled sum calculation communication times The method scalable based times measured one processor one communication link performance speedup efficiency predicted larger processor system It validated quantitatively applying two popular neural networks backpropagation Kohonen selforganizing feature map decomposed GCel transputer system Agreement model measurements within
Algorithms learning classification trees successes artificial intelligence statistics many years This paper outlines tree learning algorithm derived using Bayesian statistics This introduces Bayesian techniques splitting smoothing tree averaging The splitting rule similar Quinlans information gain smoothing averaging replace pruning Comparative experiments reimplementations minimum encoding approach Quinlans C Quinlan et al Breiman et als CART Breiman et al show full Bayesian algorithm produce Publication This paper final draft submitted publication Statistics Computing journal version minor changes appeared Volume pages accurate predictions versions approaches though pay computational price
pomdps general models sequential decisions actions observations probabilistic Many problems interest formulated pomdps yet use pomdps limited lack effective algorithms Recently started change number problems robot navigation planning beginning formulated solved pomdps The advantage pomdp approach clean semantics ability produce principled solutions integrate physical information gathering actions In paper pursue approach context two learning tasks learning sort vector numbers learning decision trees data Both problems formulated pomdps solved general pomdp algorithm The main lessons results use suitable heuristics representations allows solution sorting classification pomdps nontrivial sizes quality resulting solutions competitive best algorithms problematic aspects decision tree learning test misclassification costs noisy tests missing values naturally accommodated
Given arbitrary learning situation difficult determine appropriate learning strategy The goal research provide general representation processing framework introspective reasoning strategy selection The learning framework introspective system perform reasoning task As system also records trace reasoning along results reasoning If reasoning failure occurs system retrieves applies introspective explanation failure order understand error repair knowledge base A knowledge structure called MetaExplanation Pattern used explain conclusions derived conclusions fail If reasoning represented explicit declarative manner system examine reasoning analyze reasoning failures identify needs learn select appropriate learning strategies order learn required knowledge without overreli ance programmer
We describe ongoing project develop adaptive training system ATS dynamically models students learning processes provide specialized tutoring adapted students knowledge state learning style The student modeling component ATS MLModeler uses machine learning ML techniques emulate students novicetoexpert transition MLModeler infers learning methods student used reach current knowledge state comparing students solution trace expert solution generating plausible hypotheses misconceptions errors student made A casebased approach used generate hypotheses incorrectly applying analogy overgeneralization overspecialization The student expert models use networkbased representation includes abstract concepts relationships well strategies problem solving Fuzzy methods used represent uncertainty student model This paper describes design ATS MLModeler gives detailed example system would model tutor student typical session The domain use example highschool level chemistry
Many algorithms parameters set user For machine learning algorithms parameter setting nontrivial task influence knowledge model returned algorithm Parameter values usually set approximately according characteristics target problem obtained different ways The usual way use background knowledge target problem perform testing experiments The paper presents approach automated model selection based local optimization uses empirical evaluation constructed concept description guide search The approach tested using inductive concept learning system Magnus
This paper presents method learning logic programs without explicit negative examples exploiting assumption output completeness A mode declaration supplied target predicate training input assumed accompanied legal outputs Any outputs generated incomplete program implicitly represent negative examples however large numbers ground negative examples never need generated This method incorporated two ILP systems Chillin IFoil use intensional background knowledge Tests two natural language acquisition tasks caserole mapping pasttense learning illustrate advantages approach
Instancebased learning methods explicitly remember data receive They usually training phase prediction time perform computation Then take query search database similar datapoints build online local model local average local regression predict output value In paper review advantages instance based methods autonomous systems also note ensuing cost hopelessly slow computation database grows large We present evaluate new way structuring database new algorithm accessing maintains advantages instancebased learning Earlier attempts combat cost instancebased learning sacrificed explicit retention data applicable instancebased predictions based small number near neighbors reintroduce explicit training phase form interpolative data structure Our approach builds multiresolution data structure summarize database experiences resolutions interest simultaneously This permits us query database exibility conventional linear search greatly reduced computational cost
In paper introduce new agglomerative clustering algorithm pattern cluster represented collection fuzzy hyperboxes Initially number hyperboxes calculated represent pattern samples Then algorithm applies multiresolution techniques progressively combine hyperboxes hierarchial manner Such agglomerative scheme found yield encouraging results realworld clustering problems
This paper introduces randomized technique partitioning examples using oblique hyperplanes Standard decision tree techniques ID descendants partition set points axisparallel hyperplanes Our method contrast attempts find hyperplanes orientation The purpose general technique find smaller equally accurate decision trees created methods We tested algorithm real simulated data found cases produces surprisingly small trees without losing predictive accuracy Small trees allow us turn obtain simple qualitative descriptions problem domain
This paper introduces ICET new algorithm costsensitive classification ICET uses genetic algorithm evolve population biases decision tree induction algorithm The fitness function genetic algorithm average cost classification using decision tree including costs tests features measurements costs classification errors ICET compared three algorithms costsensitive classification EG CSID IDX also C classifies without regard cost The five algorithms evaluated empirically five realworld medical datasets Three sets experiments performed The first set examines baseline performance five algorithms five datasets establishes ICET performs significantly better competitors The second set tests robustness ICET variety conditions shows ICET maintains advantage The third set looks ICETs search bias space discovers way improve search
This paper highlights role mathematical programming particularly linear programming training neural networks A neural network description given terms separating planes input space suggests use linear programming determining planes A standard description terms mean square error output space also given leads use unconstrained minimization techniques training neural network The linear programming approach demonstrated brief description system breast cancer diagnosis use last four years major medical facility
Dissatisfaction existing standard casebased reasoning CBR systems prompted us investigate make systems creative broadly would mean creative This paper discusses three research goals understanding creative processes better investigating role cases CBR creative problem solving understanding framework supports interesting kind casebased reasoning In addition discusses methodological issues study creativity particular use CBR research paradigm exploring creativity
This work presents application machine learning characterizing important property natural DNA sequences compositional inhomogeneity Compositional segments often correspond meaningful biological units Taking account inhomogeneity prerequisite successful recognition functional features DNA sequences especially proteincoding genes Here present technique DNA segmentation using hidden Markov models A DNA sequence represented chain homogeneous segments described one statistically discriminated hidden states whose contents form firstorder Markov chain The technique used describe compare chromosomes I IV completely sequenced Saccharomyces cerevisiae yeast genome Our results indicate existence well separated states gives support isochore theory We also explore models likelihood landscape analyze dynamics optimization process thus addressing problem reliability obtained optima efficiency algorithms
Weight modifications traditional neural nets computed hardwired algorithms Without exception previous weight change algorithms many specific limitations Is principle possible overcome limitations hardwired algorithms allowing neural nets run improve weight change algorithms This paper constructively demonstrates answer principle yes I derive initial gradientbased sequence learning algorithm selfreferential recurrent network speak weight matrix terms activations It uses input output units observing errors explicitly analyzing modifying weight matrix including parts weight matrix responsible analyzing modifying weight matrix The result first introspective neural net explicit potential control adaptive parameters A disadvantage algorithm high computational complexity per time step independent sequence length equals On conn logn conn n conn number connections Another disadvantage high number local minima unusually complex error surface The purpose paper however come efficient introspective selfreferential weight change algorithm show algorithms possible
This paper discusses problem implement manytomany multiassociative mapping within connectionist models Traditional symbolic approaches work explicit representation alternatives via stored links implicitly enumerative algorithms Classical pattern association models ignore issue generating multiple outputs single input pattern recent research recurrent networks promising field clearly focused upon multiassociativity goal In paper define multiassociative memory MM several possible variants discuss utility general cognitive modeling We extend sequential cascaded networks Pollack fit task perform several initial experiments demonstrate feasibility concept
Human visual systems maintain stable internal representation scene even though image retina constantly changing eye movements Such stabilization theoretically effected dynamic shifts receptive field RF neurons visual system This paper examines neural circuit learn generate shifts The shifts controlled eye position signals compensate movement retinal image caused eye movements The development neural shifter circuit Olshausen Anderson Van Essen modeled using triadic connections These connections gated signals indicate direction gaze eye position signals In simulations neural model exposed sequences stimuli paired appropriate eye position signals The initially
This paper tries identify rules factors predictive outcome international conflict management attempts We use C advanced Machine Learning algorithm generating decision trees prediction rules cases CONFMAN database The results show simple patterns rules often understandable also reliable complex rules Simple decision trees able improve chances correctly predicting outcome conflict management attempt This suggests mediation repetitive conflicts per se results achieved far
c fl UWCC COMMA Technical Report No February x No part article may reproduced commercial purposes Abstract A technique described allows unimodal function optimization methods extended efficiently locate optima multimodal problems We describe algorithm based traditional genetic algorithm GA This involves iterating GA uses knowledge gained one iteration avoid researching subsequent iterations regions problem space solutions already found This achieved applying fitness derating function raw fitness function fitness values depressed regions problem space solutions already found Consequently likelihood discovering new solution iteration dramatically increased The technique may used various styles GA optimization methods simulated annealing The effectiveness algorithm demonstrated number multimodal test functions The technique least fast fitness sharing methods It provides speedup p problem p optima depending value p convergence time complexity
In paper examine intuition TD meant operate approximating asynchronous value iteration We note important class discrete acyclic stochastic tasks value iteration inefficient compared DAGSP algorithm essentially performs one sweep instead many working backwards goal The question address paper whether analogous algorithm used large stochastic state spaces requiring function approximation We present algorithm analyze give comparative results TD several domains state Using VI solve MDPs belonging either special classes quite inefficient since VI performs backups entire space whereas backups useful improving V fl frontier alreadycorrect notyetcorrect V fl values In fact classical algorithms problem classes compute V fl efficiently explicitly working backwards deterministic class Dijkstras shortestpath algorithm acyclic class DirectedAcyclicGraphShortestPaths DAGSP DAGSP first topologically sorts MDP producing linear ordering states every state x precedes states reachable x Then runs list reverse performing one backup per state Worstcase bounds VI Dijkstra DAGSP deterministic domains X states A actionsstate Although presents DAGSP deterministic acyclic problems applies straightforwardly
Automatic design optimization highly sensitive problem formulation The choice objective function constraints design parameters dramatically impact computational cost optimization quality resulting design The best formulation varies one application another A design engineer usually know best formulation advance In order address problem developed system supports interactive formulation testing reformulation design optimization strategies Our system includes executable dataflow language representing optimization strategies The language allows engineer define multiple stages optimization using different approximations objective constraints different abstractions design space We also developed set transformations reformulate strategies represented language The transformations approximate objective constraint functions abstractor reparameterize search spaces divide optimization process multiple stages The system applicable principle design problem expressed terms constrained optimization however expect system useful design artifact governed algebraic ordinary differential equations We tested system problems racing yacht design jet engine nozzle design We report experimental results demonstrating reformulation techniques significantly improve performance automatic design optimization Our research demonstrates viability reformulation methodology combines symbolic program transformation numerical experimentation It important first step research program aimed automating entire strategy formulation process fl Fully accepted Research Engineering Design
The classification performance neural network combined sixband LandsatTM oneband ERSSAR PRI imagery scene carried Different combinations data either raw segmented filtered using available ground truth polygons training test sets created The training sets used learning test sets used verification neural network The different combinations evaluated
The problem modeling complicated data sequences DNA speech often arises practice Most algorithms select hypothesis within model class assuming observed sequence direct output underlying generation process In paper consider case output passes memoryless noisy channel observation In particular show class Markov chains variable memory length learning affected factors despite superpolynomial still small practical cases Markov models variable memory length probabilistic finite suffix automata introduced learning theory Ron Singer Tishby also described polynomial time learning algorithm We present modification algorithm uses noisecorrupted sample knowledge noise structure The algorithm still viable noise known exactly good estimation available Finally experimental results presented removing noise corrupted English text measure performance learning algorithm affected size noisy sample noise rate
We present evaluate implemented system rapidly easily build intelligent software agents Webbased tasks Our design centered around two basic functions ScoreThisLink ScoreThisPage If given highly accurate functions standard heuristic search would lead efficient retrieval useful information Our approach allows users tailor systems behavior providing approximate advice functions This advice mapped neural network implementations two functions Subsequent reinforcements Web eg dead links ratings retrieved pages user wishes provide respectively used refine link pagescoring functions Hence architecture provides appealing middle ground nonadaptive agent programming languages systems solely learn user preferences users ratings pages We describe internal representation Web pages major predicates advice language advice mapped neural networks mechanisms refining advice based subsequent feedback We also present case study provide simple advice specialize generalpurpose system homepage finder An empirical study demonstrates approach leads effective homepage finder leading commercial Web search site
In concept learning objects domain grouped together based similarity determined attributes used describe Existing concept learners require set attributes known advance presented entirety learning begins Additionally systems possess mechanisms altering attribute set concepts learned Consequently veridical attribute set relevant task concepts used must supplied onset learning turn usefulness concepts limited task attributes originally selected In order efficiently accommodate changing contexts concept learner must able alter set descriptors without discarding prior knowledge domain We introduce notion attributeincrementation dynamic modification attribute set used describe instances problem domain We implemented capability concept learning system evaluated along several dimensions using existing concept formation system com parison
It shown Bayesian inference data modeled mixture distribution feasibly performed via Monte Carlo simulation This method exhibits true Bayesian predictive distribution implicitly integrating entire underlying parameter space An infinite number mixture components accommodated without difficulty using prior distribution mixing proportions selects reasonable subset components explain finite training set The need decide correct number components thereby avoided The feasibility method shown empirically simple classification task
This article presents new reinforcement learning method called SANE Symbiotic Adaptive NeuroEvolution evolves population neurons genetic algorithms form neural network capable performing task Symbiotic evolution promotes cooperation specialization results fast efficient genetic search discourages convergence suboptimal solutions In inverted pendulum problem SANE formed effective networks times faster Adaptive Heuristic Critic times faster Qlearning GENITOR neuroevolution approach without loss generalization Such efficient learning combined domain assumptions make SANE promising approach broad range reinforcement learning problems including many realworld applications
The paper concerns probabilistic evaluation plans presence unmeasured variables plan consisting several concurrent sequential actions We establish graphical criterion recognizing effects given plan predicted passive observations measured variables When criterion satisfied closedform expression provided probability plan achieve specified goal
We introduce technique enhance ability dynamic ILP processors exploit speculatively executed parallelism Existing branch prediction mechanisms used establish dynamic window ILP extracted limited abilities create large accurate dynamic window ii initiate large number instructions window every cycle iii traverse multiple branches control flow graph per prediction We introduce control flow prediction uses information control flow graph program overcome limitations We discuss information present control flow graph represented using multiblocks conveyed hardware using Control Flow Tables Control Flow Prediction Buffers We evaluate potential control flow prediction abstract machine dynamic ILP processing model Our results indicate control flow prediction powerful effective assist hardware making informed run time decisions program control flow
We develop mean field theory sigmoid belief networks based ideas statistical mechanics Our mean field theory provides tractable approximation true probability distribution networks also yields lower bound likelihood evidence We demonstrate utility framework benchmark problem statistical pattern recognitionthe classification handwritten digits
Many learning experience systems use information extracted problem solving experiences modify performance element PE forming new element PE solve similar problems efficiently However transformations improve performance one set problems degrade performance sets new PE always better original PE depends distribution problems We therefore seek performance element whose expected performance distribution optimal Unfortunately actual distribution needed determine element optimal usually known Moreover task finding optimal element even knowing distribution intractable interesting spaces elements This paper presents method palo sidesteps problems using set samples estimate unknown distribution using set transformations hillclimb local optimum This process based mathematically rigorous form utility analysis particular uses statistical techniques determine whether result proposed transformation better original system We also present efficient way implementing learning system context general class performance elements include empirical evidence approach work effectively fl Much work performed University Toronto supported Institute Robotics Intelligent Systems operating grant National Science Engineering Research Council Canada We also gratefully acknowledge receiving many helpful comments William Cohen Dave Mitchell Dale Schuurmans anonymous referees
Compositional QLearning CQL Singh modular approach learning perform composite tasks made several elemental tasks reinforcement learning Skills acquired performing elemental tasks also applied solve composite tasks Individual skills compete right act winning skills included decomposition composite task We extend original CQL concept two ways general reward function agent one actuator We use CQL architecture acquire skills performing composite tasks simulated twolinked manipulator large state action spaces The manipulator nonlinear dynamical system require endeffector specific positions workspace Fast function approximation Qmodules achieved use array Cerebellar Model Articulation Controller CMAC Albus Our research interests involve scaling machine learning methods especially reinforcement learning autonomous robot control We interested function approximators suitable reinforcement learning problems large state spaces Cerebellar Model Articulation Controller CMAC Albus permit fast online learning good local generalization In addition interested task decomposition reinforcement learning use hierarchical modular function approximator architectures We examining effectiveness modified Hierarchical Mixtures Experts HME Jordan Jacobs approach reinforcement learning since original HME developed mainly supervised learning batch learning tasks The incorporation domain knowledge reinforcement learning agents important way extending capabilities Default policies specified domain knowledge also used restrict size stateaction space leading faster learning We investigating use QLearning Watkins planning tasks using classifier system Holland encode necessary conditionaction rules Jordan M Jacobs R Hierarchical mixtures experts EM algorithm Technical Report MIT Computational Cognitive Science
The processing performed feedforward neural network often interpreted use decision hyperplanes layer The adaptation process however normally explained using picture gradient descent error landscape In paper dynamics decision hyperplanes used model adaptation process A electromechanical analogy drawn dynamics hyperplanes determined interaction forces hyperplanes particles represent patterns Relaxation system determined increasing hyperplane inertia mass This picture used clarify dynamics learning go way explaining learning deadlocks escaping certain local minima Furthermore network plasticity introduced dynamic property system reduction necessary consequence information storage Hyperplane inertia used explain avoid destructive relearning trained networks
Modifications Recursive AutoAssociative Memory presented allow store deeper complex data structures previously reported These modifications include adding extra layers compressor reconstructor networks employing integer rather realvalued representations preconditioning weights presetting representations compatible The resulting system tested data set syntactic trees extracted Penn Treebank
The problem combining preferences arises several applications combining results different search engines This work describes efficient algorithm combining multiple preferences We first give formal framework problem We describe analyze new boosting algorithm combining preferences called RankBoost We also describe efficient implementation algorithm restricted case We discuss two experiments carried assess performance RankBoost In first experiment used algorithm combine different search strategies query expension given domain For task compare performance RankBoost individual search strategies The second experiment collaborativefiltering task specifically making movie recommendations Here present results comparing RankBoost nearest neighbor regression algorithms
This paper shows decision trees used improve performance casebased learning CBL systems We introduce performance task machine learning systems called semiflexible prediction lies classification task performed decision tree algorithms flexible prediction task performed conceptual clustering systems In semiflexible prediction learning improve prediction specific set features known priori rather single known feature classification arbitrary set features conceptual clustering We describe one task natural language processing present experiments compare solutions problem using decision trees CBL hybrid approach combines two In hybrid approach decision trees used specify features included knearest neighbor case retrieval Results experiments show hybrid approach outperforms decision tree casebased approaches well two casebased systems incorporate expert knowledge case retrieval algorithms Results clearly indicate decision trees used improve performance CBL systems without reliance potentially expensive expert knowledge
Technical Report No Department Statistics University Toronto We describe linear network models correlations realvalued visible variables using one realvalued hidden variables factor analysis model This model seen linear version Helmholtz machine parameters learned using wakesleep method learning primary generative model assisted recognition model whose role fill values hidden variables based values visible variables The generative recognition models jointly learned wake sleep phases using delta rule This learning procedure comparable simplicity Ojas version Hebbian learning produces somewhat different representation correlations terms principal components We argue simplicity wakesleep learning makes factor analysis plau sible alternative Hebbian learning model activitydependent cortical plasticity
A Bayesian method estimating amino acid distributions states hidden Markov model HMM protein family columns multiple alignment family introduced This method uses Dirichlet mixture densities priors amino acid distributions These mixture densities determined examination previously constructed HMMs multiple alignments It shown Bayesian method improve quality HMMs produced small training sets Specific experiments EFhand motif reported priors shown produce HMMs higher likelihood unseen data fewer false positives false negatives database search task
This paper proposes simple cost model machine learning applications based notion net present value The model extends unifies models used Pazzani et al Masand PiatetskyShapiro It attempts answer question Should given machine learning system prototype stage fielded The models inputs systems confusion matrix cash flow matrix application cost per decision onetime cost deploying system rate return investment Like Provost Fawcetts ROC convex hull method present model used decisionmaking even input variables known exactly Despite simplicity number nontrivial consequences For example free lunch theorems learning theory longer apply
This paper demonstrates use graphs mathematical tool expressing independenices formal language communicating processing causal information statistical analysis We show complex information external interventions organized represented graphically conversely graphical representation used facilitate quantitative predictions effects interventions We first review Markovian account causation show directed acyclic graphs DAGs offer economical scheme representing conditional independence assumptions deducing displaying logical consequences assumptions We introduce manipulative account causation show DAG defines simple transformation tells us probability distribution change result external interventions system Using transformation possible quantify nonexperimental data effects external interventions specify conditions randomized experiments necessary Finally paper offers graphical interpretation Rubins model causal effects demonstrates equivalence manipulative account causation We exemplify tradeoffs two approaches deriving nonparametric bounds treatment effects conditions imperfect compliance
In casebased design adaptation design case new design requirements plays important role If sufficient adapt predefined set design parameters task easily automated If however farreaching creative changes required current systems provide limited success This paper describes approach creative design adaptation based notion creativity goal oriented shift focus search process An evolving representation used restructure search space designs similar example case lie focus search This focus used starting point create new designs
We consider novel nonlinear model time series analysis The study model emphasizes theoretical aspects well practical applicability The architecture model demonstrated sufficiently rich sense approximating unknown functional forms yet retains simple intuitive characteristics linear models A comparison established nonlinear models emphasized theoretical issues backed prediction results benchmark time series well computer generated data sets Efficient estimation algorithms seen applicable made possible mixture based structure model Large sample properties estimators discussed well well specified well misspecified settings We also demonstrate inference pertaining data structure may made parameterization model resulting better intuitive understanding structure performance model
The coverage learning algorithm number concepts learned algorithm samples given size This paper asks whether good learning algorithms designed maximizing coverage The paper extends previous upper bound coverage Boolean concept learning algorithm describes two algorithmsMultiBalls LargeBallwhose coverage approaches upper bound Experimental measurement coverage ID FRINGE algorithms shows coverage far bound Further analysis LargeBall shows although learns many concepts seem interesting concepts Hence coverage maximization alone appear yield practicallyuseful learning algorithms The paper concludes definition coverage within bias suggests way coverage maximization could applied strengthen weak preference biases
At previous FOGA workshop presented initial results using Markov models analyze transient behavior genetic algorithms GAs used function optimizers GAFOs In paper states Markov model ordered via simple mathematically convenient lexicographic ordering used initially Nix Vose In paper explore alternative orderings states based interesting semantic properties average fitness degree homogeneity average attractive force etc We also explore lumping techniques reducing size state space Analysis reordered lumped Markov models provides new insights transient behavior GAs general GAFOs particular
An important aspect creative design concept emergence Though emergence important mechanism either well understood limited domain shapes This deficiency compensated considering definitions emergent behaviour Artificial Life ALife research community With new insights proposed computational technique called evolving representations design genes extended emergent behaviour We demonstrate emergent behaviour coevolutionary model design This coevolutionary approach design allows solution space structure space evolve response problem space behaviour space Since behaviour space active participant behaviour may emerge new structures end design process This paper hypothesizes emergent behaviour identified using technique The floor plan example Gero Schnier extended demonstrate behaviour emerge coevolutionary design process
We investigate learnability PAC model data used learning attributes labels either corrupted incomplete In order prove main results define new complexity measure statistical query SQ learning algorithms The view SQ algorithm maximum queries algorithm number input bits query depends We show restricted view SQ algorithm class general sufficient condition learnability models attribute noise covered missing attributes We show since algorithms question statistical also simultaneously tolerate classification noise Classes results hold therefore learned simultaneous attribute noise classification noise include kDNF ktermDNF DNF representations conjunctions relevant variables uniform distribution decision lists These noise models first PAC models training data attributes labels may corrupted random process Previous researchers shown class kDNF learnable attribute noise attribute noise rate known exactly We show attribute noise learnability results either without classification noise also hold exact noise rate known provided learner instead polynomially good approximation noise rate In addition show results also hold one noise rate distinct noise rate attribute Our results learning random covering require learner told even approximation covering rate addition hold setting distinct covering rates attribute Finally give lower bounds number examples required learning presence attribute noise covering
This study describes new Hidden Markov Model HMM system segmenting uncharacterized genomic DNA sequences exons introns intergenic regions Separate HMM modules designed trained specific regions DNA exons introns intergenic regions splice sites The models tied together form biologically feasible topology The integrated HMM trained set eukaryotic DNA sequences tested using segment separate set sequences The resulting HMM system called VEIL Viterbi ExonIntron Locator obtains overall accuracy test data total bases correctly labelled correlation coefficient Using stringent test exact exon prediction VEIL correctly located ends coding exons exons predicts exactly correct These results compare favorably best previous results gene structure prediction demonstrate benefits using HMMs problem
Recently increased interest lifelong machine learning methods transfer knowledge across multiple learning tasks Such methods repeatedly found outperform conventional singletask learning algorithms learning tasks appropriately related To increase robustness approaches methods desirable reason relatedness individual learning tasks order avoid danger arising tasks unrelated thus potentially misleading This paper describes taskclustering TC algorithm TC clusters learning tasks classes mutually related tasks When facing new learning task TC first determines related task cluster exploits information selectively task cluster An empirical study carried mobile robot domain shows TC outperforms nonselective counterpart situations small number tasks relevant
Belief revision belief update proposed two types belief change serving different purposes Belief revision intended capture changes agents belief state reflecting new information static world Belief update intended capture changes belief response changing world We argue belief revision belief update restrictive routine belief change involves elements We present model generalized update allows updates response external changes inform agent prior beliefs This model update combines aspects revision update providing realistic characterization belief change We show certain assumptions original update postulates satisfied We also demonstrate plain revision plain update special cases model way formally verifies intuition revision suitable static belief change
We propose Bayesian framework regression problems covers areas usually dealt function approximation An online learning algorithm derived solves regression problems Kalman filter Its solution always improves increasing model complexity without risk overfitting In infinite dimension limit approaches true Bayesian posterior The issues prior selection overfitting also discussed showing commonly held beliefs misleading The practical implementation summarised Simulations using popular publicly available data sets used demonstrate method highlight important issues concerning choice priors
This report deals efficient mapping sparse neural networks CNS We develop parallel vector code idealized sparse network determine performance three memory systems We use code evaluate memory systems one implemented prototype pinpoint bottlenecks current CNS design
In nature genotype many organisms exhibits diploidy ie includes two copies every gene In paper describe results simulations comparing behavior haploid diploid populations ecological neural networks living fixed changing environments We show diploid genotypes create variability fitness population haploid genotypes buffer better environmental change consequence one wants obtain good results average peak fitness single population one choose diploid population appropriate mutation rate Some results simulations parallel biological findings
The problem belief changehow agent revise beliefs upon learning new informationhas active area research philosophy artificial intelligence Many approaches belief change proposed literature Our goal introduce yet another approach examine carefully rationale underlying approaches already taken literature highlight view methodological problems literature The main message study belief change carefully must quite explicit ontology scenario underlying belief change process This something missing previous work focus postulates Our analysis shows must pay particular attention two issues often taken granted The first model agents epistemic state Do use set beliefs richer structure ordering worlds And use set beliefs language beliefs expressed The second status observations Are observations known true believed In latter case firm belief For example argue even postulates called beyond controversy unreasonable agents beliefs include beliefs epistemic state well external world Issues status observations arise particularly consider iterated belief revision must confront possibility revising
The study belief change active area philosophy AI In recent years two special cases belief change belief revision belief update studied detail Roughly speaking revision treats surprising observation sign previous beliefs wrong update treats surprising observation indication world changed In general would expect agent making observation may want revise earlier beliefs assume change occurred world We define novel approach belief change allows us applying ideas probability theory qualitative settings The key idea use qualitative Markov assumption says state transitions independent We show recent approach modeling qualitative uncertainty using plausibility measures allows us make qualitative Markov assumption relatively straightforward way show Markov assumption used provide attractive beliefchange model
In reinforcement learning frequently necessary resort approximation true optimal value function Here investigate benefits online search cases We examine local searches agent performs finitedepth lookahead search global searches agent performs search trajectory way current state goal state The key success methods lies taking value function gives rough solution hard problem finding good trajectories every single state combining online search gives accurate solution easier problem finding good trajectory specifically current state
Probabilistic contextfree grammars PCFGs provide simple way represent particular class distributions sentences contextfree language Efficient parsing algorithms answering particular queries PCFG ie calculating probability given sentence finding likely parse applied variety patternrecognition problems We extend class queries answered several ways allowing missing tokens sentence sentence fragment supporting queries intermediate structure presence particular nonterminals flexible conditioning variety types evidence Our method works constructing Bayesian network represent distribution parse trees induced given PCFG The network structure mirrors chart standard parser generated using similar dynamicprogramming approach We present algorithm constructing Bayesian networks PCFGs show queries patterns queries network correspond interesting queries PCFGs
This paper presents recent developments toward formalism combines useful properties logic probabilities Like logic formalism admits qualitative sentences provides symbolic machinery deriving deductively closed beliefs like probability permits us express ifthen rules different levels firmness retract beliefs response changing observations Rules interpreted orderofmagnitude approximations conditional probabilities impose constraints rankings worlds Inferences supported unique priority ordering rules syntactically derived knowledge base This ordering accounts rule interactions respects specificity considerations facilitates construction coherent states beliefs Practical algorithms developed analyzed testing consistency computing rule ordering answering queries Imprecise observations incorporated using qualitative versions Jeffreys Rule Bayesian updating result coherent belief revision embodied naturally tractably Finally causal rules interpreted imposing Markovian conditions constrain world rankings reflect modularity causal organizations These constraints shown facilitate reasoning causal projections explanations actions change
Clay evolutionary architecture autonomous robots integrates motor schemabased control reinforcement learning Robots utilizing Clay benefit realtime performance motor schemas continuous dynamic environments taking advantage adaptive reinforcement learning Clay coordinates assemblages groups motor schemas using embedded reinforcement learning modules The coordination modules activate specific assemblages based presently perceived situation Learning occurs robot selects assemblages samples reinforcement signal time Experiments robot soccer simulation illustrate performance utility system
Most known learning algorithms dynamic neural networks nonstationary environments need global computations perform credit assignment These algorithms either local time local space Those algorithms local time space usually deal sensibly hidden units In contrast far judge learning rules biological systems many hidden units local space time In paper propose parallel online learning algorithm performs local computations yet still designed deal hidden units units whose past activations hidden time The approach inspired Hollands idea bucket brigade classifier systems transformed run neural network fixed topology The result feedforward recurrent neural dissipative system consuming weightsubstance permanently trying distribute substance onto connections appropriate way Simple experiments demonstrating feasability algorithm reported
Although creativity largely studied problem solving contexts creativity consists generative component comprehension component In particular creativity essential part reading understanding natural language stories We formalized understanding process developed algorithm capable producing creative understanding behavior We also created novel knowledge organization scheme assist process Our model creativity implemented portion ISAAC Integrated Story Analysis And Creativity reading system system models creative reading science fiction stories
Creative designers often see solutions pending design problems everyday objects surrounding This often lead innovation insight sometimes revealing new functions purposes common design pieces process We interested modeling serendipitous recognition solutions pending problems context creative mechanical design This paper characterizes ability analyzing observations made placing context forms recognition We propose computational model capture explore serendipitous recognition based ideas reconstructive dynamic memory situation assessment casebased reasoning
In paper analyze two wellknown measures attribute selection decision tree induction informativity gini index In particular interested influence different methods estimating probabilities two measures The results experiments show different measures obtained different probability estimation methods determine preferential order attributes given node Therefore determine structure constructed decision tree This feature beneficial especially realworld applications several different trees often required
We consider learning situations function used classify examples may switch back forth small number different concepts course learning We examine several models situations oblivious models switches made independent selection examples adversarial models single adversary controls concept switches example selection We show relationships benign models pconcepts Kearns Schapire present polynomialtime algorithms learning switches two kDNF formulas For adversarial model present model success patterned popular competitive analysis used studying online algorithms We describe randomized query algorithm adversarial switches two monotone disjunctions competitive total number mistakes plus queries high probability bounded number switches plus fixed polynomial n number variables We also use notions described provide sufficient conditions learning pconcept class decision rule implies able learn class model probability
Case based systems typically retrieve cases case base applying similarity measures The measures usually constructed ad hoc manner This report presents toolbox systematic construction similarity measures In addition paving way design methodology similarity measures systematic approach facilitates identification opportunities parallelisation case base retrieval
In real world applications software engineers recognise use memory must organised via data structures software using data must independant data structures implementation details They achieve using abstract data structures records files buffers We demonstrate genetic programming automatically implement simple abstract data structures considering detail task evolving list We show general reasonably efficient implementations automatically generated simple primitives A model maintaining evolved code demonstrated using list problem Much published work genetic programming GP evolves functions without sideeffects learn patterns test data In contrast human written programs often make extensive explicit use memory Indeed memory form required programming system Turing Complete ie possible write computable program system However inclusion memory make interactions parts programs much complex make harder produce programs Despite shown GP automatically create programs explicitly use memory Teller In normal genetic programming considerable benefits found adopting structured approach For example Koza shows introduction evolvable code modules automatically defined functions ADFs greatly help GP reach solution We suggest corresponding structured approach use data similarly significant advantage GP Earlier work demonstrated genetic programming automatically generate simple abstract data structures namely stacks queues Langdon That GP evolve programs organise memory accessed via simple read write primitives data structures used external software without needing know implemented This chapter shows possible evolve list data structure basic primitives Aho Hopcroft Ullman suggest three different ways implement list experiments show GP evolve implementation This requires list components agree one implementation coevolve together Section describes GP architecture including use Pareto multiple component fitness scoring measures aimed speeding GP search The evolved solutions described Section Section presents candidate model maintaining evolved software This followed discussion learned conclusions drawn
Report SYCON ABSTRACT We pursue particular approach analog computation based dynamical systems type used neural networks research Our systems fixed structure invariant time corresponding unchanging number neurons If allowed exponential time computation turn unbounded power However polynomialtime constraints limits capabilities though powerful Turing Machines A similar restricted model shown polynomialtime equivalent classical digital computation previous work Moreover precise correspondence nets standard nonuniform circuits equivalent resources consequence one lower bound constraints compute This relationship perhaps surprising since analog devices change manner input size We note networks likely solve polynomially NPhard problems equality p np model implies almost complete collapse standard polynomial hierarchy
We introduce convergence diagnostic procedure MCMC operates estimating total variation distances distribution algorithm certain numbers iterations The method advantages many existing methods terms applicability utility computational expense interpretability It used assess convergence marginal joint posterior densities show applied two commonly used MCMC samplers Gibbs Sampler Metropolis Hastings algorithm Illustrative examples highlight utility interpretability proposed diagnostic also highlight limitations
Because distance skull brain different resistivities electroencephalographic EEG data collected point human scalp includes activity generated within large brain area This spatial smearing EEG data volume conduction involve significant time delays however suggesting Independent Component Analysis ICA algorithm Bell Sejnowski suitable performing blind source separation EEG data The ICA algorithm separates problem source identification source localization First results applying ICA algorithm EEG eventrelated potential ERP data collected sustained auditory detection task show ICA training insensitive different random seeds ICA may used segregate obvious artifactual EEG components line muscle noise eye movements sources ICA capable isolating overlapping EEG phenomena including alpha theta bursts spatiallyseparable ERP components separate ICA channels Nonstationarities EEG behavioral state tracked using ICA via changes amount residual correlation ICAfiltered output channels
Miller G The magical number seven plus minus two Some limits capacity processing information The Psychological Review Schmidhuber J b Towards compositional learning dynamic neural networks Technical Report FKI Technische Universitat Munchen Institut fu Informatik ServanSchreiber D Cleermans A McClelland J Encoding sequential structure simple recurrent networks Technical Report CMUCS Carnegie Mellon University Computer Science Department
The standard approach decision tree induction topdown greedy algorithm makes locally optimal irrevocable decisions node tree In paper study alternative approach algorithms use limited lookahead decide test use node We systematically compare using large number decision trees quality decision trees induced greedy approach trees induced using lookahead The main results experiments greedy approach produces trees accurate trees produced much expensive lookahead step ii decision tree induction exhibits pathology sense lookahead produce trees larger less accurate trees produced without
This thesis presents machine learning model capable extracting discrete classes continuous valued input features This done using neurally inspired novel competitive classifier CC feeds discrete classifications forward supervised machine learning model The supervised learning model uses discrete classifications perhaps information available solve problem The supervised learner generates feedback guide CC potentially useful classifications continuous valued input features Two supervised learning models combined CC creating ASOCSAFE IDAFE Both models simulated results analyzed Based results several areas future research proposed
We frequently called upon perform multiple tasks compete attention resource Often know optimal solution task isolation paper describe knowledge exploited efficiently find good solutions tasks parallel We formulate problem dynamically merging multiple Markov decision processes MDPs composite MDP present new theoreticallysound dynamic programming algorithm finding optimal policy composite MDP We analyze various aspects algorithm Every day faced problem multiple tasks parallel competes attention resource If running job shop must decide machines allocate jobs order jobs miss deadlines If mail delivery robot must find intended recipients mail simultaneously avoiding fixed obstacles walls mobile obstacles people still manage keep sufficiently charged Frequently know perform task isolation paper considers take information individual tasks combine efficiently find optimal solution entire set tasks parallel More importantly describe theoreticallysound algorithm merging dynamically new tasks new job arrival job shop assimilated online solution found ongoing set simultaneous tasks illustrate use simple merging problem
We consider problem fitting n fi n distance matrix D tree metric T Let distance closest tree metric min T fk T D k g First present On algorithm finding additive tree T k T D k giving first algorithm problem performance guarantee Second show N Phard find tree T k T D k lt
CaseBased Planning CBP provides way scaling domainindependent planning solve large problems complex domains It replaces detailed lengthy search solution retrieval adaptation previous planning experiences In general CBP demonstrated improve performance generative fromscratch planning However performance improvements provides dependent adequate judgements problem similarity In particular although CBP may substantially reduce planning effort overall subject misretrieval problem The success CBP depends retrieval errors relatively rare This paper describes design implementation replay framework casebased planner dersnlpebl dersnlpebl extends current CBP methodology incorporating explanationbased learning techniques allow explain learn retrieval failures encounters These techniques used refine judgements case similarity response feedback wrong decision made The failure analysis used building case library addition repairing cases Large problems split stored single goal subproblems Multigoal problems stored smaller cases fail merged full solution An empirical evaluation approach demonstrates advantage learning experienced retrieval failure
Modern processors improve instruction level parallelism speculation The outcome data control decisions predicted operations speculatively executed committed original predictions correct There number ways processor resources could used threading eager execution As use speculation increases believe processors need form speculation control balance benefits speculation possible activities Confidence estimation one technique exploited architects speculation control In paper introduce performance metrics compare confidence estimation mechanisms argue metrics appropriate speculation control We compare number confidence estimation mechanisms focusing mechanisms small implementation cost gain benefit exploiting characteristics branch predictors clustering mispredicted branches We compare performance different confidence estimation methods using detailed pipeline simulations Using simulations show improve confidence estimators providing better insight future investigations comparing applying confidence estimators
Relational learning algorithms special interest members machine learning community offer practical methods extending representations used algorithms solve supervised learning tasks Five approaches currently explored address issues involved using relational representations This paper surveys algorithms embodying approaches summarizes empirical evaluations highlights commonalities suggests potential directions future research
The learning process Boltzmann Machines computationally expensive The computational complexity exact algorithm exponential number neurons We present new approximate learning algorithm Boltzmann Machines based mean field theory linear response theorem The computational complexity algorithm cubic number neurons In absence hidden units show weights directly computed fixed point equation learning rules Thus case need use gradient descent procedure learning process We show solutions method close optimal solutions give significant improvement correlations play significant role Finally apply method pattern completion task show good performance networks neurons
This paper introduces methodology solving combinatorial optimization problems application reinforcement learning methods The approach applied cases several similar instances combinatorial optimization problem must solved The key idea analyze set training problem instances learn search control policy solving new problem instances The search control policy twin goals finding highquality solutions finding quickly Results applying methodology NASA scheduling problem show learned search control policy much effective best known nonlearning search procedurea method based simulated annealing
Markov decision processes MDPs undiscounted rewards represent important class problems decision control The goal learning MDPs find policy yields maximum expected return per unit time In large state spaces computing averages directly feasible instead agent must estimate stochastic exploration state space In case longer exploration times enable accurate estimates informed decisionmaking The learning curve MDP measures agents performance depends allowed exploration time T In paper analyze learning curves simple control problem undiscounted rewards In particular methods statistical mechanics used calculate lower bounds agents performance thermodynamic limit T N ff T N finite T number time steps allotted per policy evaluation N size state space In limit provide lower bound return policies appear optimal based imperfect statistics
One effectively utilize predicated execution improve branch handling instructionlevel parallel processors Although potential benefits predicated execution high tradeoffs involved design instruction set support predicated execution difficult On one end design spectrum architectural support full predicated execution requires increasing number source operands instructions Full predicate support provides flexibility largest potential performance improvements On end partial predicated execution support conditional moves requires little change existing architectures This paper presents preliminary study qualitatively quantitatively address benefit full partial predicated execution support With current compiler technology show compiler use partial full predication achieve speedup large controlintensive programs Some details code generation techniques shown provide insight benefit going partial full predication Preliminary experimental results encouraging partial predication provides average performance improvement issue processor predicate support full predication provides additional improvement
This paper studies selfdirected learning variant online learning model learner selects presentation order instances We give tight bounds complexity selfdirected learning concept classes monomials kterm DNF formulas orthogonal rectangles f ng These results demonstrate number mistakes selfdirected learning surprisingly small We prove model selfdirected learning powerful commonly used online query learning models Next explore relationship complexity selfdirected learning VapnikChervonenkis dimension Finally explore relationship Mitchells version space algorithm existence selfdirected learning algorithms make mistakes fl Supported part GE Foundation Junior Faculty Grant NSF Grant CCR Part research conducted author MIT Laboratory Computer Science supported NSF grant DCR grant Siemens Corporation Net address sgcswustledu
A lowerbound result power Abstract This paper presents lowerbound result computational power genetic algorithm context combinatorial optimization We describe new genetic algorithm merged genetic algorithm prove class monotonic functions algorithm finds optimal solution exponential convergence rate The analysis pertains ideal behavior algorithm main task reduces showing convergence probability distributions search space combinatorial structures optimal one We take exponential convergence indicative efficient solvability samplebounded algorithm although sampling theory needed better relate limit behavior actual behavior The paper concludes discussion immediate problems lie ahead genetic algorithm
In paper study forecasting model based mixture experts predicting French electric daily consumption energy We split task two parts Using mixture experts first model predicts electricity demand exogenous variables temperature degree cloud cover viewed nonlinear regression model mixture Gaussians Using single neural network second model predicts evolution residual error first one viewed nonlinear autoregression model We analyze splitting input space generated mixture experts model compare performance models presently used
Suttons TD metho aims provide represen tation cost function absorbing Mark ov chain transition costs A simple example given represen tation obtained dep ends For represen tation optimal resp ect least squares error criterion decreases towards represen tation becomes progressiv ely worse cases poor The example suggests need understand better circumstances TD Qlearning obtain satisfactory neural net workbased compact represen tations cost function A variation TD also prop osed performs b etter example
Casebased reasoning involves reasoning cases specific pieces experience reasoners anothers used solve problems We use term graphstructured representations capable expressing relations two objects case allow set relations used vary case case allow set possible relations expanded necessary describe new cases Such representations implemented example semantic networks lists concrete propositions logic We believe graphstructured representations offer significant advantages thus investigating ways implement representations efficiently We make casebased argument using examples two systems chiron caper show graphstructured representation supports two different kinds casebased planning two different domains We discuss costs associated graphstructured representations describe approach reducing costs imple mented caper
The advantage using linear regression leaves regression tree analysed paper It carried modification affects construction pruning interpretation regression tree The modification tested artificial reallife domains impact classification error stability induced trees considered The results show modification beneficial leads smaller classification errors induced regression trees The Bayesian approach estimation class distributions used experiments
General Theory Quantitative Results Abstract The human genotype represents ten billion binary informations whereas human brain contains million times billion synapses So differentiated brain structure essentially due selforganization Such selforganization relevant areas ranging medicine design intelligent complex systems Many brain structures emerge collective phenomenon microscopic neurosynaptic dynamics stochastic dynamics mimics neuronal action potentials synaptic dynamics modeled local coupling dynamics type Hebbrule synaptic efficiency increases coincident spiking pre postsynaptic neuron The microscopic dynamics transformed collective dynamics reminiscent hydrodynamics The theory models empirical findings quantitatively Topology preserving neuronal maps assumed Descartes selforganization suggested Weiss empirical observation reported Marshall shown neurosynaptically stable due ubiquitous infinitesimal short range electrical chemical leakage In visual cortex neuronal stimulus orientation preference emerges empirically measured orientation patterns determined Poisson equation electrostatics Poisson equation orientation pattern emergence derived Complex cognitive abilities emerge basic local synaptic changes regulated valuation emergent valuation attention attention focus combination subnetworks Altogether general theory presented emergence functionality synaptic growth neurobiological systems The theory provides transformation collective dynamics used quantitative modeling empirical data
A Methodology Evaluating Theory Revision Systems Results Abstract Theory revision systems learning systems goal making small changes original theory account new data A measure distance two theories proposed This measure corresponds minimum number edit operations literal level required transform one theory another By computing distance original theory revised theory claim theory revision system makes revisions theory may quantitatively evaluated We present data using accuracy distance metric Audrey II Audrey II fl
We present novel data mining approach based decomposition In order analyze given dataset method decomposes hierarchy smaller less complex datasets analyzed independently The method experimentally evaluated realworld housing loans allocation dataset showing decomposition discover meaningful intermediate concepts decompose relatively complex dataset datasets easy analyze comprehend derive classifier high classification accuracy We also show human interaction positive effect comprehensibility classification accuracy
Most empirical evaluations machine learning algorithms case studies evaluations multiple algorithms multiple databases Authors case studies implicitly explicitly hypothesize pattern results often suggests one algorithm performs significantly better others limited small number databases investigated instead holds general class learning problems However hypotheses rarely supported additional evidence leaves suspect This paper describes empirical method generalizing results case studies example application This method yields rules describing algorithms significantly outperform others dependent measures Advantages generalizing case studies limitations particular approach also described
Learning multiple descriptions class data shown reduce generalization error amount error reduction varies greatly domain domain This paper presents novel empirical analysis helps understand variation Our hypothesis amount error reduction linked degree descriptions class make errors correlated manner We present precise novel definition notion use twentynine data sets show amount observed error reduction negatively correlated degree descriptions make errors correlated manner We empirically show possible learn descriptions make less correlated errors domains many ties search evaluation measure eg information gain experienced learning The paper also presents results help understand multiple descriptions help irrelevant attributes much help large amounts class noise
This paper presents Plannett system combines artificial neural networks achieve expert level accuracy difficult scientific task recognizing volcanos radar images surface planet Venus Plannett uses ANNs vary along two dimensions set input features used train number hidden units The ANNs combined simply averaging output activations When Plannett used classification module threestage image analysis system called JAR tool endtoend accuracy sensitivity specificity good human planetary geologist fourimage test suite JARtoolPlannett also achieves best algorithmic accuracy images date
Planning learning multiple levels temporal abstraction key problem artificial intelligence In paper summarize approach problem based mathematical framework Markov decision processes reinforcement learning Conventional modelbased reinforcement learning uses primitive actions last one time step modeled independently learning agent These generalized macro actions multistep actions specified arbitrary policy way completing Macro actions generalize classical notion macro operator closed loop uncertain variable duration Macro actions needed represent commonsense higherlevel actions going lunch grasping object traveling distant city This paper generalizes prior work temporally abstract models Sutton extends prediction setting include actions control planning We define semantics models macro actions guarantees validity planning using models This paper present new results theory planning macro actions illustrates potential advantages gridworld task
This paper reviews five approximate statistical tests determining whether one learning algorithm outperforms another particular learning task These tests compared experimentally determine probability incorrectly detecting difference difference exists type I error Two widelyused statistical tests shown high probability Type I error certain situations never used These tests test difference two proportions b paireddifferences test based taking several random traintest splits A third test paireddifferences test based fold crossvalidation exhibits somewhat elevated probability Type I error A fourth test McNemars test shown low Type I error The fifth test new test xcv based iterations fold crossvalidation Experiments show test also acceptable Type I error The paper also measures power ability detect algorithm differences exist tests The crossvalidated test powerful The xcv test shown slightly powerful McNemars test The choice best test determined computational cost running learning algorithm For algorithms executed McNemars test test acceptable Type I error For algorithms executed ten times xcv test recommended slightly powerful directly measures variation due choice training set
Many classification algorithms passive assign classlabel instance based description given even description incomplete In contrast active classifier cost obtain values missing attributes deciding upon class label The expected utility using active classifier depends cost required obtain additional attribute values penalty incurred outputs wrong classification This paper considers problem learning nearoptimal active classifiers using variant probablyapproximatelycorrect PAC model After defining framework perhaps main contribution paper describe situation task achieved efficiently show task often intractable
Probabilistic inference algorithms finding probable explanation maximum aposteriori hypothesis maximum expected utility updating belief reformulated eliminationtype algorithm called bucket elimination This emphasizes principle common many algorithms appearing literature clarifies relationship nonserial dynamic programming algorithms We also present general way combining conditioning elimination within framework Bounds complexity given algorithms function problems struc ture
This article published Sociological Methodology edited Peter V Marsden Cambridge Mass Blackwells Adrian E Raftery Professor Statistics Sociology Department Sociology DK University Washington Seattle WA This research supported NIH grant RHD I would like thank Robert Hauser Michael Hout Steven Lewis Scott Long Diane Lye Peter Marsden Bruce Western Yu Xie two anonymous reviewers detailed comments earlier version I also grateful Clem Brooks Sir David Cox Tom DiPrete John Goldthorpe David Grusky Jennifer Hoeting Robert Kass David Madigan Michael Sobel Chris Volinsky helpful discussions correspondence
In paper propose family algorithms combining treeclustering conditioning trade space time Such algorithms useful reasoning probabilistic deterministic networks well accomplishing optimization tasks By analyzing problem structure possible select spectrum algorithm best meets given timespace specifica tion
In paper propose new approach probabilistic inference belief networks global conditioning simple generalization Pearls b method loopcutset conditioning We show global conditioning well loopcutset conditioning thought special case method Lauritzen Spiegelhalter refined Jensen et al b Nonetheless approach provides new opportunities parallel processing case sequential processing tradeoff time memory We also show hybrid method Suermondt others combining loopcutset conditioning Jensens method viewed within framework By exploring relationships methods develop unifying framework advantages approach combined successfully
The dynamics collective properties feedback networks spiking neurons investigated Special emphasis given potential computational role subthreshold oscillations It shown model systems integrateandfire neurons function associative memories two distinct levels On first level binary patterns represented spike activity fire fire On second level analog patterns encoded relative firing times individual spikes spikes underlying subthreshold oscillation Both coding schemes may coexist within network The results suggest cortical neurons may perform broad spectrum associative computations far beyond scope traditional firingrate picture
This paper considers new method maintaining diversity creating subpopulations standard generational evolutionary algorithm Unlike methods replaces concept distance individuals tag bits identify subpopulation individual belongs Two variations method presented illustrating feasibility approach
Ideally pattern recognition machines provide constant output inputs transformed group G desired invariances These invariances achieved enhancing training data include examples inputs transformed elements G leaving corresponding targets unchanged Alternatively cost function training include regularization term penalizes changes output input transformed group This paper relates two approaches showing precisely sense regularized cost function approximates result adding transformed distorted examples training data The cost function enhanced training set equivalent sum original cost function plus regularizer For unbiased models regularizer reduces intuitively obvious choice term penalizes changes output inputs transformed group For infinitesimal transformations coefficient regularization term reduces variance distortions introduced training data This correspondence provides simple bridge two approaches
A new method proposed exploiting causal independencies exact Bayesian network inference A Bayesian network viewed representing factorization joint probability multiplication set conditional probabilities We present notion causal independence enables one factorize conditional probabilities combination even smaller factors consequently obtain finergrain factorization joint probability The new formulation causal independence lets us specify conditional probability variable given parents terms associative commutative operator sum max contribution parent We start simple algorithm VE Bayesian network inference given evidence query variable uses factorization find posterior distribution query We show algorithm extended exploit causal independence Empirical studies based CPCS networks medical diagnosis show method efficient previous methods allows inference larger networks previous algorithms
Our goal develop hybrid cognitive model humans acquire skills complex cognitive tasks We pursuing goal designing hybrid computational architectures NRL Navigation task requires competent sensorimotor coordination In paper describe results directly fitting human execution data task We next present empirically compare two methods modeling control knowledge acquisition reinforcement learning novel variant action models human learning task The paper concludes experimental demonstration impact background knowledge system performance Our results indicate performance action models approach closely approximates rate human learning task reinforcement learning
The statistical query learning model viewed tool creating demonstrating existence noisetolerant learning algorithms PAC model The complexity statistical query algorithm conjunction complexity simulating SQ algorithms PAC model noise determine complexity noisetolerant PAC algorithms produced Although roughly optimal upper bounds shown complexity statistical query learning corresponding noisetolerant PAC algorithms optimal due inefficient simulations In paper provide improved simulations new variant statistical query model order overcome inefficiencies We improve time complexity classification noise simulation statistical query algorithms Our new simulation roughly optimal dependence noise rate We also derive simpler proof statistical queries simulated presence classification noise This proof makes fewer assumptions queries therefore allows one simulate general types queries We also define new variant statistical query model based relative error show variant natural strictly powerful standard additive error model We demonstrate efficient PAC simulations algorithms new model give general upper bounds learning relative error statistical queries PAC simulation We show statistical query algorithm simulated PAC model malicious errors way resultant PAC algorithm roughly optimal tolerable malicious error rate sample complexity Finally generalize types queries allowed statistical query model We discuss advantages allowing generalized queries show results improved simulations also hold queries This paper available Center Research Computing Technology Division Applied Sciences Harvard University technical report TR
This paper outlines problems may occur Reduced Error Pruning relational learning algorithms notably efficiency Thereafter new method Incremental Reduced Error Pruning proposed attempts address problems Experiments show many noisy domains method much efficient alternative algorithms along slight gain accuracy However experiments show well use algorithm recommended domains require specific concept description
One kind prosodic structure apparently underlies music language meter Yet detailed measurements music speech show nested periodicities define metrical structure noisy sense What kind system could produce perceive variable metrical timing And would take store particular metrical patterns longterm memory system We developed network coupled oscillators produces perceives metrical patterns pulses In addition beginning initial state biases learns prefer beat patterns like waltzes beat patterns Models general class could learn entrain musical patterns And given way process speech extract appropriate pulses model applicable metrical structure speech well Is language metrical Meter refers particular sorts patterns time abstract description patterns potentially cognitive representation In cases two hierarchical levels equally spaced events occur periods characterizing levels integral multiples usually The hierarchy implied standard Western musical notation different levels indicated kinds notes quarter notes half notes etc bars separating measures For example basic waltztime meter individual beats spacing grouped sets three every third one receiving stronger accent In meter hierarchy consisting faster periodic cycle beat level slower one measure level fast onset zero phase angle coinciding zero phase angle every third beat Metrical systems like seem underlie forms music around world often said underlie human speech well Jones Martin However awkward difficulty definition employs notion integer since data music speech show clearly perfect temporal ratios predicted definition observed performance In music performance various kinds systematic temporal deviations timing specified musical notation known
We propose model abduction based revision epistemic state agent Explanations must sufficient induce belief sentence explained instance observation ensure consistency beliefs manner adequately accounts factual hypothetical sentences Our model generate explanations nonmonotonically predict observation thus generalizing current accounts require deductive relationship explanation observation It also provides natural preference ordering explanations defined terms normality plausibility To illustrate generality approach reconstruct two key paradigms modelbased diagnosis abductive consistencybased diagnosis within framework This reconstruction provides alternative semantics extends systems accommodate predictive explanations semantic preferences explanations It also illustrates general information incorporated principled manner fl Some parts paper appeared preliminary form Abduction Belief Revision A Model Preferred Explanations Proc Eleventh National Conf Artificial Intelligence AAAI Washington DC pp
Report R ISRN SICSRSE ISSN Abstract The optimal probability activation corresponding performance studied three designs Sparse Distributed Memory namely Kanervas original design Jaeckels selectedcoordinates design Karlssons modifi cation Jaeckels design We assume hard locations Karlssons case masks storage addresses stored data randomly chosen consider different levels random noise reading address
Report R ISRN SICSRSE ISSN Abstract We consider sparse distributed memory randomly chosen hard locations unknown number T random data vectors stored A method given estimate T content memory high accuracy In fact estimate unbiased coefficient variation roughly inversely proportional p MU M number hard locations memory U length data accuracy made arbitrarily high making memory big enough A consequence good reading methods used without need special extra location introduced
We describe rankedmodel semantics ifthen rules admitting exceptions provides coherent framework many facets evidential causal reasoning Rule priorities automatically extracted form knowledge base facilitate construction retraction plausible beliefs To represent causation formalism incorporates principle Markov shielding imposes stratified set independence constraints rankings interpretations We show formalism resolves classical problems associated specificity prediction abduction offers natural way unifying belief revision belief update reasoning actions
We build mathematical connection ExpectationMaximization EM algorithm gradientbased approaches maximum likelihood learning finite Gaussian mixtures We show EM step parameter space obtained gradient via projection matrix P provide explicit expression matrix We analyze convergence EM terms special properties P provide new results analyzing effect P likelihood surface Based mathematical results present comparative discussion advantages disadvantages EM algorithms learning Gaussian mixture models
A first order regression algorithm capable handling realvalued continuous variables introduced applications presented Regressional learning assumes realvalued class discrete realvalued variables The algorithm combines regressional learning standard ILP concepts first order concept description background knowledge A clause generated successively refining initial clause adding literals form A v discrete attributes A v A v realvalued attributes background knowledge literals clause body The algorithm employs covering approach beam search heuristic impurity function stopping criteria based local improvement minimum number examples maximum clause length minimum local improvement minimum description length allowed error variable depth An outline algorithm results systems application artificial realworld domains presented The realworld domains comprise modelling water behavior surge tank modelling workpiece roughness steel grinding process modelling operators behavior process electrical discharge machining Special emphasis given evaluation obtained models domain experts comments aspects practical use induced knowledge The results obtained knowledge acquisition process show several important guidelines knowledge acquisition concerning mainly process interaction domain experts exposing primarily importance comprehensibility induced knowledge
We address problem measuring degree hemispheric organization asymmetry organization computational model bihemispheric cerebral cortex A theoretical framework measures developed used produce algorithms measuring degree organization symmetry lateralization topographic map formation The performance resulting measures tested several topographic maps obtained selforganization initially random network results compared subjective assessments made humans It found closest agreement human assessments obtained using organization measures based sigmoidtype error averaging Measures developed correct large constant displacements well curving hemispheric topographic maps
Learning structure temporallyextended sequences difficult computational problem fraction relevant information available instant Although variants back propagation principle used find structure sequences practice sufficiently powerful discover arbitrary contingencies especially spanning long temporal intervals involving high order statistics For example designing connectionist network music composition encountered problem net able learn musical structure occurs locally timeeg relations among notes within musical phrasebut structure occurs longer time periodseg relations among phrases To address problem require means constructing reduced description sequence makes global aspects explicit readily detectable I propose achieve using hidden units operate different time constants Simulation experiments indicate slower timescale hidden units able pick global structure structure simply learned standard Many patterns world intrinsically temporal eg speech music unfolding events Recurrent neural net architectures devised accommodate timevarying sequences For example architecture shown Figure map sequence inputs sequence outputs Learning structure temporallyextended sequences difficult computational problem input pattern may contain taskrelevant information instant Thus back propagation
An incremental higherorder nonrecurrent neuralnetwork combines two properties found useful sequence learning neuralnetworks higherorder connections incremental introduction new units The incremental higherorder neuralnetwork adds higher orders needed adding new units dynamically modify connection weights The new units modify weights next timestep information previous step Since theoretically unlimited number units added network information arbitrarily distant past brought bear prediction Temporal tasks thereby learned without use feedback contrast recurrent neuralnetworks Because recurrent connections training simple fast Experiments demonstrated speedups two orders magnitude recurrent networks
In complex models like hidden Markov chains convergence MCMC algorithms used approximate posterior distribution Bayes estimates parameters interest must controlled robust manner We propose paper series online controls rely classical nonparametric tests evaluate independence startup distribution stability Markov chain asymptotic normality These tests lead graphical control spreadsheets presented setup normal mixture hidden Markov chains compare full Gibbs sampler aggregated Gibbs sampler based forwardbackward formulae
Three different methods investigated determine ability detect classify various categories diffuse liver disease A statistical method ie discriminant analysis supervised neural network called backpropagation nonsupervised selforganizing feature map examined The investigation performed basis previously selected set acoustic image texture parameters The limited number patients successfully extended generating additional independent data identical statistical properties The generated data used training test sets The final test made original patient data validation set It concluded neural networks attractive alternative traditional statistical techniques dealing medical detection classification tasks Moreover use generated data training networks discriminant classifier shown justified profitable
Nonlinear extensions oneunit multiunit Principal Component Analysis PCA neural networks introduced earlier authors reviewed The networks nonlinear Hebbian learning rules related signal expansions like Projection Pursuit PP Independent Component Analysis ICA Separation results mixtures real world signals im ages given
Many hormones physiological processes vary circadian pattern Although sinecosine function used model patterns functional form appropriate asymmetry peak nadir phases In paper describe semiparametric periodic spline function fit circadian rhythms The model includes phase amplitude time magnitude peak nadir estimated We also describe tests fit components model Data experiment study immunological responses humans used demonstrate methods
We present highlevel decompositionbased algorithms largescale blockangular optimization problems containing integer variables demonstrate effectiveness solution largescale graph partitioning problems These algorithms combine subproblemcoordination paradigm lower bounds pricedirective decomposition methods knapsack genetic approaches utilization building blocks partial solutions Even graph partitioning problems requiring billions variables standard formulation approach produces highquality solutions measured deviations easily computed lower bound substantially outperforms widelyused graph partitioning techniques based heuristics spectral methods
Maps regional morbidity mortality rates useful tools determining spatial patterns disease Combined sociodemographic census information also permit assessment environmental justice ie whether certain subgroups suffer disproportionately certain diseases adverse effects harmful environmental exposures Bayes empirical Bayes methods proven useful smoothing crude maps disease risk eliminating instability estimates lowpopulation areas maintaining geographic resolution In paper extend existing hierarchical spatial models account temporal effects spatiotemporal interactions Fitting resulting highlyparametrized models requires careful implementation Markov chain Monte Carlo MCMC methods well novel techniques model evaluation selection We illustrate approach using dataset countyspecific lung cancer rates state Ohio period
A novel unsupervised neural network dimensionality reduction seeks directions emphasizing multimodality presented connection exploratory projection pursuit methods discussed This leads new statistical insight synaptic modification equations governing learning Bienenstock Cooper Munro BCM neurons The importance dimensionality reduction principle based solely distinguishing features demonstrated using phoneme recognition experiment The extracted features compared features extracted using backpropagation network
This paper reprinted Computational Learning Theory Natural Learning Systems vol T Petsche S Hanson J Shavlik eds Copyrighted MIT Press Abstract The ability inductive learning system find good solution given problem dependent upon representation used features problem A number factors including trainingset size ability learning algorithm perform constructive induction mediate effect input representation accuracy learned concept description We present experiments evaluate effect input representation generalization performance realworld problem finding genes DNA Our experiments demonstrate two different input representations task result significantly different generalization performance neural networks decision trees neural symbolic methods constructive induction fail bridge gap two representations We believe realworld domain provides interesting challenge problem machine learning subfield constructive induction relationship two representations well known conceptually representational shift involved constructing better representation imposing
In paper emphasize role selection evolutionary algorithms We briefly review common selection schemes fields Genetic Algorithms Evolution Strategies Genetic Programming However classify selection schemes according group evolutionary algorithm belong rather distinguish parent selection schemes global competition replacement schemes local competition replacement schemes This paper intend fully review analyse presented selection schemes tries short reference standard advanced selection schemes
Selfsupervised backpropagation unsupervised learning procedure feedforward networks desired output vector identical input vector For backpropagation able use powerful simulators running parallel machines Topologypreserving maps hand developed variant competitive learning procedure However degenerate case selfsupervised backpropagation version competitive learning A simple extension cost function backpropagation leads competitive version selfsupervised backpropagation used produce topographic maps We demonstrate approach applied Traveling Salesman Problem TSP The algorithm implemented using backpropagation simulator CLONES parallel machine RAP
This paper describes evolving computational model perception production simple rhythmic patterns The model consists network oscillators different resting frequencies couple input patterns Oscillators whose frequencies match periodicities input tend become activated Metrical structure represented explicitly network form clusters oscillators whose frequencies phase angles constrained maintain harmonic relationships characterize meter Rests rhythmic patterns represented explicit rest oscillators network become activated expected beat pattern fails appear The model makes predictions relative difficulty The nested periodicity defines musical probably also linguistic meter appears fundamental way people perceive produce patterns time Meter however sufficient describe patterns interesting memorable deviate metrical hierarchy The simplest deviations rests gaps one levels hierarchy would normally beat When beats removed regular intervals match period level metrical hierarchy call simple rhythmic pattern Figure shows example simple rhythmic pattern Below grid representation meter behind pattern patterns effect deviations periodicity input
In paper generalize several results uniform approximation orders radial basis functions Buhmann Dyn Levin Dyn Ron L p approximation orders These results apply particular approximants spaces spanned translates radial basis functions scattered centres Examples results apply include quasiinterpolation leastsquares approximation radial function spaces
The paper studies L IR norm approximations space spanned discrete set translates basis function Attention restricted functions whose Fourier transform smooth IR n singularity origin Examples basis functions thinplate splines multiquadrics well types radial basis functions employed Approximation Theory The approximation problem wellunderstood case set points ffi used translating forms lattice IR many optimal quasioptimal approximation schemes already found literature In contrast mostly specific results known set ffi scattered points The main objective paper provide general tool extending approximation schemes use integer translates basis function nonuniform case We introduce single relatively simple conversion method preserves approximation orders provided large number schemes presently literature precisely almost stationary schemes In anticipation future introduction new schemes uniform grids effort made impose mild conditions function still allow unified error analysis hold In course discussion recent results BuDL scattered center approximation reproduced improved upon
An upper bound L p approximation power p provided principal shiftinvariant spaces derived mild assumptions generator It applies stationary nonstationary ladders shown apply spaces generated exponential box splines polyharmonic splines multiquadrics Gauss kernel
In speeduplearning problems full descriptions operators always known explanationbased learning EBL reinforcement learning RL applied This paper shows methods involve fundamentally process propagating information backward goal toward starting state RL performs propagation statebystate basis EBL computes weakest preconditions operators hence performs propagation regionbyregion basis Based observation RL form asynchronous dynamic programming paper shows develop dynamic programming version EBL call ExplanationBased Reinforcement Learning EBRL The paper compares batch online versions EBRL batch online versions RL standard EBL The results show EBRL combines strengths EBL fast learning ability scale large state spaces strengths RL learning optimal policies Results shown chess endgames synthetic maze tasks
In paper present extensions kmeans algorithm vector quantization permit efficient use image segmentation pattern classification tasks It shown introducing state variables correspond certain statistics dynamic behavior algorithm possible find representative centers lower dimensional manifolds define boundaries classes clouds multidimensional multiclass data permits one example find class boundaries directly sparse data eg image segmentation tasks efficiently place centers pattern classification eg local Gaussian classifiers The state variables used define algorithms determining adaptively optimal number centers clouds data spacevarying density Some examples application extensions also given This report describes research done within CIMAT Guanajuato Mexico Center Biological Computational Learning Department Brain Cognitive Sciences Artificial Intelligence Laboratory This research sponsored grants Office Naval Research contracts NJ NJ grant National Science Foundation contract ASC grant National Institutes Health contract NIH SRR Additional support provided North Atlantic Treaty Organization ATR Audio Visual Perception Research Laboratories Mitsubishi Electric Corporation Sumitomo Metal Industries Siemens AG Support AI Laboratorys artificial intelligence research provided ONR contract NJ JL Marroquin supported part grant Consejo Nacional de Ciencia Tecnologia Mexico
The limitations using selforganizing maps SOM either clusteringvector quantization VQ multidimensional scaling MDS discussed reviewing recent empirical findings relevant theory SOMs remaining ability VQ MDS time challenged new combined technique online Kmeans clustering plus Sammon mapping cluster centroids SOM shown perform significantly worse terms quantization error recovering structure clusters preserving topology comprehensive empirical study using series multivariate normal clustering problems
While exploring find better solutions agent performing online reinforcement learning RL perform worse acceptable In cases exploration might unsafe even catastrophic results often modeled terms reaching failure states agents environment This paper presents method uses domain knowledge reduce number failures exploration This method formulates set actions RL agent composes control policy ensure exploration conducted policy space excludes unacceptable policies The resulting action set abstract relationship task solved common many applications RL Although cost added safety learning may result suboptimal solution argue appropriate tradeoff many problems We illustrate method domain motion planning
In learning problems connectionist network trained finite sized training set better generalization performance often obtained unneeded weights network eliminated One source unneeded weights comes inclusion input variables provide little information output variables We propose method identifying eliminating input variables The method first determines relationship input output variables using nonparametric density estimation measures relevance input variables using information theoretic concept mutual information We present results method simple toy problem nonlinear time series
In work applying genetic algorithms populations neural networks real distinction genotype phenotype In nature information contained genotype mapping genetic information phenotype usually much complex The genotypes many organisms exhibit diploidy ie include two copies gene two copies identical sequences therefore functional difference products usually proteins expressed phenotypic feature termed dominant one one recessive expressed In paper review literature use diploidy dominance operators genetic algorithms present new results obtained simulations changing environments finally discuss results simulations parallel biological findings
The Multiscalar architecture advocates distributed processor organization tasklevel speculation exploit high degrees instruction level parallelism ILP sequential programs without impeding improvements clock speeds The main goal paper understand key implications architectural features distributed processor organization tasklevel speculation compiler task selection point view performance We identify fundamental performance issues control ow speculation data communication data dependence speculation load imbalance task overhead We show issues intimately related key characteristics tasks task size intertask control ow intertask data dependence We describe compiler heuristics select tasks favorable characteristics We report experimental results show heuristics successful boosting overall performance establishing larger ILP windows
Technical Report January Abstract This paper introduces Introspection Approach method learning agent employing reinforcement learning decide ask training agent instruction When using approach find number trainers responses produced significantly faster learners learner ask aid randomly Guidance received via approach informative random guidance Thus reduce interaction training agent learning agent without reducing speed learner develops policy In fact intelligent learner asks help even increase learning speed level trainer interaction
We present method feature construction selection finds minimal set conjunctive features appropriate perform classification task For problems bias appropriate method outperforms constructive induction algorithms able achieve higher classification accuracy The application method search minimal multilevel boolean expressions presented analyzed help examples
In paper discuss Bayesian approach finding latent classes data In approach use finite mixture models describe underlying structure data demonstrate possibility use full joint probability models raises interesting new prospects exploratory data analysis The concepts methods discussed illustrated case study using data set recent educational study The Bayesian classification approach described implemented presents appealing addition standard toolbox exploratory data analysis educational data
We present two additions hierarchical mixture experts HME architecture We view HME tree structured classifier Firstly applying likelihood splitting criteria expert HME grow tree adaptively training Secondly considering probable path tree may prune branches away either temporarily permanently become redundant We demonstrate results growing pruning algorithms show significant speed ups efficient use parameters conventional algorithms discriminating two interlocking spirals classifying bit parity patterns
Systems interacting realworld data must address issues raised possible presence errors observations makes In paper first present framework discussing imperfect data resulting problems may cause We distinguish two categories errors data random errors noise systematic errors examine relationship task describing observations way also useful helping future problemsolving learning tasks Secondly proceed examine techniques currently used AI research recognising errors
Genetic Programming applied task evolving general iterative sorting algorithms A connection size generality discovered Adding inverse size fitness measure along correctness decreases size resulting evolved algorithms also dramatically increases generality thus effectiveness evolution process In addition variety differing problem formulations investigated relative probability success reported An example evolved sort problem formulation presented initial attempt made understand variations difficulty resulting differing problem formulations
Irrelevant redundant features may reduce predictive accuracy comprehensibility induced concepts Most common Machine Learning approaches selecting good subset relevant features rely crossvalidation As alternative present application particular Minimum Description Length MDL measure task feature subset selection Using MDL principle allows taking account available data The new measure informationtheoretically plausible yet still simple therefore efficiently computable We show empirically new method judging value feature subsets efficient performs least well methods based crossvalidation Domains large number training examples large number possible features yield biggest gains efficiency Thus new approach seems scale better large learning problems previous methods
rules Rivest Inductive algorithms AQ CN learn decision lists incrementally one rule time Such algorithms face rule overlap problem classification accuracy decision list depends overlap learned rules Thus even though rules learned isolation evaluated concert Existing algorithms solve problem adopting greedy iterative structure Once rule learned training examples match rule removed training set We propose novel solution problem composing decision lists homogeneous rules rules whose classification accuracy change position decision list We prove problem finding maximally accurate decision list reduced problem finding maximally accurate homogeneous rules We report performance algorithm data sets UCI repository MONKs problems
Methods build function approximators example data gained considerable interest past Especially methodologies build models allow interpretation attracted attention Most existing algorithms however either complicated use infeasible highdimensional problems This article presents efficient easy use algorithm construct fuzzy graphs example data The resulting fuzzy graphs based locally independent fuzzy rules operate solely selected important attributes This enables application fuzzy graphs also problems high dimensional spaces Using illustrative examples real world data set demonstrated resulting fuzzy graphs offer quick insights structure example data underlying model
We describe methodology enabling intelligent teaching system make high level strategy decisions basis low level student modeling information This framework less costly construct superior hand coding teaching strategies responsive learners needs In order accomplish reinforcement learning used learn associate superior teaching actions certain states students knowledge Reinforcement learning RL shown flexible handling noisy data need expert domain knowledge A drawback RL often needs significant number trials learning We propose offline learning methodology using sample data simulated students small amounts expert knowledge bypass problem
Any intelligent system whether human robotic must capable dealing patterns time Temporal pattern processing achieved system shortterm memory capacity STM different representations maintained time In work propose neural model wherein STM realized leaky integrators selforganizing system The model exhibits compositionality ability extract construct progressively complex structured associations hierarchical manner starting basic primitive temporal elements An important feature proposed model use temporal correlations express dynamic bindings
In tasks requiring sustained attention human alertness varies minute time scale This serious consequences occupations ranging air traffic control monitoring nuclear power plants Changes electroencephalographic EEG power spectrum accompany fluctuations level alertness assessed measuring simultaneous changes EEG performance auditory monitoring task By combining power spectrum estimation principal component analysis artificial neural networks show continuous accurate noninvasive near realtime estimation operators global level alertness feasible using EEG measures recorded two central scalp sites This demonstration could lead practical system noninvasive monitoring cognitive state human operators attentioncritical settings
This paper presents exact solutions convergent approximations inferences Bayesian networks associated finitely generated convex sets distributions Robust Bayesian inference calculation bounds posterior values given perturbations probabilistic model The paper presents exact inference algorithms analyzes circumstances exact inference becomes intractable Two classes algorithms numeric approximations developed transformations original model The first transformation reduces robust inference problem estimation probabilistic parameters Bayesian network The second transformation uses Lavines bracketing algorithm generate sequence maximization problems Bayesian network The analysis extended contaminated lower density bounded belief function subsigma density bounded total variation density ratio classes distributions c fl Carnegie Mellon University
One fundamental problems learning identifying members two different classes For example diagnose cancer one must learn discriminate benign malignant tumors Through examination tumors previously determined diagnosis one learns function distinguishing benign malignant tumors Then acquired knowledge used diagnose new tumors The perceptron simple biologically inspired model twoclass learning problem The perceptron trained constructed using examples two classes Then perceptron used classify new examples We describe geometrically perceptron capable learning Using duality develop framework investigating different methods training perceptron Depending define best perceptron different minimization problems developed training perceptron The effectiveness methods evaluated empirically four practical applications breast cancer diagnosis detection heart disease political voting habits sonar recognition This paper assume prior knowledge machine learning pattern recognition
I define latent variable model form neural network target outputs specified inputs unspecified Although inputs missing still possible train model placing simple probability distribution unknown inputs maximizing probability data given parameters The model discover description data terms underlying latent variable space lower dimensionality I present preliminary results application models protein data
This paper studies aspects two categories usually differ like relevance generalization role loss function presents unifying formalism types information identified answers generalized questions shows kind generalized information necessary enable learning aims put usual training data prior information equal footing discussing possibilities variants measurement control generalized questions including examples smoothness symmetries reviews shortly measurement linguistic concepts based fuzzy priors principles combine preprocessors uses Bayesian decision theoretic framework contrasting parallel inverse decision problems proposes problems nonapproximation aspects Bayesian two step approximation consisting posterior maximization subsequent risk minimization analyses empirical risk minimization aspect nonlocal information compares Bayesian two step approximation empirical risk minimization including interpretations Occams razor formulates examples stationarity conditions maximum posterior approximation nonlocal nonconvex priors leading inhomogeneous nonlinear equations similar example equations scattering theory physics In summary paper focuses dependencies answers different questions Because training examples alone dependencies enable generalization emphasizes need empirical measurement control explicit treatment theory This report describes research done within Center Biological Computational Learning Department Brain Cognitive Sciences Massachusetts Institute Technology This research sponsored grant National Science Foundation contract ASC grant ONRARPA contract NJ The author supported Postdoctoral Fellowship Le Deutsche Forschungsgemeinschaft NSFCISE Postdoctoral Fellowship
We present alternative cellular encoding technique Gruau evolving graph network structures via genetic programming The new technique called edge encoding uses edge operators rather node operators cellular encoding While cellular encoding edge encoding produce possible graphs two encodings bias genetic search process different ways may therefore useful different set problems The problems techniques may used think edge encoding may particularly useful include evolution recurrent neural networks finite automata graphbased queries symbolic knowledge bases In preliminary report present technical description edge encoding initial comparison cellular encoding Experimental investigation relative merits encoding schemes currently progress
We present technique evaluating classifications geometric comparison rule sets Rules represented objects ndimensional hyperspace The similarity classes computed overlap geometric class descriptions The system produces correlation matrix indicates degree similarity pair classes The technique applied classifications generated different algorithms different numbers classes different attribute sets Experimental results case study medical domain included
As natural resources become less abundant naturally become interested adept utilisation waste materials In bringing bear ploy key importance learning I argue paper In Truth Trash model learning viewed process uses environmental feedback assemble fortuitous sensory predispositions sensory trash useful information vehicles ie truthful indicators salient phenomena The main aim show computer implementation model used enhance learning strategic abilities simulated football playing mobot
This paper concerns probabilistic evaluation effects actions presence unmeasured variables We show identification causal effect singleton variable X set variables Y accomplished systematically time polynomial number variables graph When causal effect identifiable closedform expression obtained probability action achieve specified goal set goals
VISOR large connectionist system shows visual schemas learned represented used mechanisms natural neural networks Processing VISOR based cooperation competition parallel bottomup topdown activation schema representations Simulations show VISOR robust noise variations inputs parameters It indicate confidence analysis pay attention important minor differences use context recognize ambiguous objects Experiments also suggest representation learning stable behavior consistent human processes priming perceptual reversal circular reaction learning The schema mechanisms VISOR serve starting point building robust highlevel vision systems perhaps schemabased motor control natural language processing systems well
We present framework characterizing Bayesian classification methods This framework thought spectrum allowable dependence given probabilistic model Naive Bayes algorithm restrictive end learning full Bayesian networks general extreme While much work carried along two ends spectrum surprising little done along middle We analyze assumptions made one moves along spectrum show tradeoffs model accuracy learning speed become critical consider variety data mining domains We present general induction algorithm allows traversal spectrum depending available computational power carrying induction show application number domains different properties
Traits acquired members evolving population lifetime adaptive processes learning become genetically specified later generations Thus change level learning population evolutionary time This paper explores idea well benefits gained learning may also costs paid ability learn It costs supply selection pressure genetic assimilation acquired traits Two models presented attempt illustrate assertion The first uses Kauffmans NK fitness landscapes show effect explicit implicit costs assimilation learnt traits A characteristic hump observed graph level plasticity population showing learning first selected evolution progresses The second model practical example neural network controllers evolved small mobile robot Results experiment also show hump
The evolution population guided phenotypic traits acquired members population lifetime This phenomenon known Baldwin Effect speed evolutionary process traits initially acquired become genetically specified later generations This paper presents conditions genetic assimilation take place As well benefits lifetime adaptation give population may cost paid adaptive ability It evolutionary tradeoff costs benefits provides selection pressure acquired traits become genetically specified It also noted genotypic space evolution operates phenotypic space adaptive processes learning operate general different nature To guarantee acquired characteristic become genetically specified spaces must property neighbourhood correlation means small distance two individuals phenotypic space implies small distance two individuals genotypic space
Decision Trees widely used classificationregression tasks They relatively much faster build compared Neural Networks understandable humans In normal decision trees based input vector one branch followed In Probabilistic OPtion trees based input vector follow subtrees probability These probabilities learned system Probabilistic decisions likely useful boundary classes submerge noise input data In addition provide us confidence measure We allow option nodes trees Again instead uniform voting learn weightage every subtree
The fundamental backpropagation BP algorithm training artificial neural networks cast deterministic nonmonotone perturbed gradient method Under certain natural assumptions series learning rates diverging series squares converging established every accumulation point online BP iterates stationary point BP error function The results presented cover serial parallel online BP modified BP momentum term BP weight decay
Recurrent neural networks trained behave like deterministic finitestate automata DFAs show deteriorating performance tested long strings This deteriorating performance attributed instability internal representation learned DFA states The use sigmoidal discriminant function together recurrent structure contribute instability We prove simple algorithm construct secondorder recurrent neural networks sparse interconnection topology sigmoidal discriminant function internal DFA state representations stable ie constructed network correctly classifies strings arbitrary length The algorithm based encoding strengths weights directly neural network We derive relationship weight strength number DFA states robust string classification For DFA n states input alphabet symbols constructive algorithm generates programmed neural network On neurons Omn weights We compare algorithm methods proposed literature
Report SYCON Recent Results Lyapunovtheoretic Techniques Nonlinear Stability ABSTRACT This paper presents Converse Lyapunov Function Theorem motivated robust control analysis design Our result based upon generalizes various aspects wellknown classical theorems In unified natural manner includes arbitrary bounded disturbances acting system deals global asymptotic stability results smooth infinitely differentiable Lyapunov functions applies stability respect necessarily compact invariant sets As corollary obtained Converse Theorem show wellknown Lyapunov sufficient condition inputtostate stability also necessary settling positively open question raised several authors past years
Technical Report CSTR UMIACSTR University Maryland College Park MD Abstract The extraction symbolic knowledge trained neural networks direct encoding partial knowledge networks prior training important issues They allow exchange information symbolic connectionist knowledge representations The focus paper quality rules extracted recurrent neural networks Discretetime recurrent neural networks trained correctly classify strings regular language Rules defining learned grammar extracted networks form deterministic finitestate automata DFAs applying clustering algorithms output space recurrent state neurons Our algorithm extract different finitestate automata consistent training set network We compare generalization performances different models trained network introduce heuristic permits us choose among consistent DFAs model best approximates learned regular grammar
Jobshop scheduling important task manufacturing industries We interested particular task scheduling payload processing NASAs space shuttle program This paper summarizes previous work formulating task solution reinforcement learning algorithm T D A shortcoming previous work reliance handengineered input features This paper shows extend timedelay neural network TDNN architecture apply irregularlength schedules Experimental tests show TDNNT D network match performance previous handengineered system The tests also show neural network approaches significantly outperform best previous nonlearning solution problem terms quality resulting schedules number search steps required construct
Report SYCON ABSTRACT This paper deals simulation Turing machines neural networks Such networks made interconnections synchronously evolving processors updates state according sigmoidal linear combination previous states units The main result states one may simulate Turing machines nets linear time In particular possible give net made processors computes universal partialrecursive function This update Report SYCON new results include simulation linear time binarytape machines opposed unary alphabets used previous version
In paper develop new LRTAbased algorithms variety tasks analyze complexity The LRTA algorithm realtime search algorithm developed Korf It used reach stationary moving goal state identify shortest paths given start state stationary goal state algorithm reset start state reaches goal state Our algorithms search horizon one require internal memory must able store information states For example bidirectional LRTS algorithm determines optimal universal plans ie finds optimal paths states set stationary goal states even reset actions available We show tasks studied paper solved LRTAbased algorithms On action executions state spaces size n
In paper consider problem approximating function belonging function space linear combination n translates given function G Using lemma Jones Barron show possible define function spaces functions G rate convergence zero error O p n number dimensions The apparent avoidance curse dimensionality due fact function spaces constrained dimension increases Examples include spaces Sobolev type number weak derivatives required larger number dimensions We give results approximation L norm L norm The interesting feature results thanks constructive nature Jones Barrons lemma iterative procedure defined achieve rate This paper describes research done within Center Biological Information Processing Department Brain Cognitive Sciences Artificial Intelligence Laboratory Department Mathematics University Trento Italy Gabriele Anzellotti Department Mathematics University Trento Italy This research sponsored grant Office Naval Research ONR Cognitive Neural Sciences Division Artificial Intelligence Center Hughes Aircraft Corporation S Support A I Laboratorys artificial intelligence research provided Advanced Research Projects Agency Department Defense Army contract DACAC part ONR contract NK c fl Massachusetts Institute Technology
University Wisconsin Computer Sciences Technical Report September Abstract In explanationbased learning specific problems solution generalized form later used solve conceptually similar problems Most research explanationbased learning involves relaxing constraints variables explanation specific example rather generalizing graphical structure explanation However precludes acquisition concepts iterative recursive process implicitly represented explanation fixed number applications This paper presents algorithm generalizes explanation structures reports empirical results demonstrate value acquiring recursive iterative concepts The BAGGER algorithm learns recursive iterative concepts integrates results multiple examples extracts useful subconcepts generalization On problems learning recursive rule appropriate system produces result standard explanationbased methods Applying learned recursive rules requires minor extension PROLOGlike problem solver namely ability explicitly call specific rule Empirical studies demonstrate generalizing structure explanations helps avoid recently reported negative effects learning
We consider Gibbs sampler applied uniform distribution bounded region R R We show convergence properties Gibbs sampler depend greatly smoothness boundary R Indeed sufficiently smooth boundaries sampler uniformly ergodic jagged boundaries sampler could fail even geometrically ergodic
In learning examples often useful expand attributevector representation intermediate concepts The usual advantage structuring learning problem makes learning easier improves comprehensibility induced descriptions In paper develop technique discovering useful intermediate concepts class attributes realvalued The technique based decomposition method originally developed design switching circuits recently extended handle incompletely specified multivalued functions It also applied machine learning tasks In paper introduce modifications needed decompose real functions present symbolic form The method evaluated number test functions The results show method correctly decomposes fairly complex functions The decomposition hierarchy depend given repertoir basic functions background knowledge
Uncertainty sampling methods iteratively request class labels training instances whose classes uncertain despite previous labeled instances These methods greatly reduce number instances expert need label One problem approach classifier best suited application may expensive train use selection instances We test use one classifier highly efficient probabilistic one select examples training another C rule induction program Despite chosen heterogeneous approach uncertainty samples yielded classifiers lower error rates random samples ten times larger
Certain causal models involving unmeasured variables induce independence constraints among observed variables imply nevertheless inequality constraints observed distribution This paper derives general formula inequality constraints induced instrumental variables exogenous variables directly affect variables With help formula possible test whether model involving instrumental variables may account data conversely whether given vari able deemed instrumental
Bayesian confidence intervals smoothing spline often used distinguish two curves In paper provide asymptotic formula sample size calculations based Bayesian confidence intervals Approximations simulations special functions indicate asymptotic formula reasonably accurate Key Words Bayesian confidence intervals sample size smoothing spline fl Address Department Statistics Applied Probability University California Santa Barbara CA Tel Fax Email yuedongpstatucsbedu Supported National Institute Health Grants R EY P DK P HD
We describe several improvements Freund Schapires AdaBoost boosting algorithm particularly setting hypotheses may assign confidences predictions We give simplified analysis AdaBoost setting show analysis used find improved parameter settings well refined criterion training weak hypotheses We give specific method assigning confidences predictions decision trees method closely related one used Quinlan This method also suggests technique growing decision trees turns identical one proposed Kearns Mansour We focus next apply new boosting algorithms multiclass classification problems particularly multilabel case example may belong one class We give two boosting methods problem One leads new method handling singlelabel case simpler effective techniques suggested Freund Schapire Finally give experimental results comparing algorithms discussed paper
Evolutionary Algorithms direct random search algorithms imitate principles natural evolution method solve adaptation learning tasks general As several features common observed genetic phenotypic level living species In paper algorithms capability adaptation learning wider sense demonstrated focused Genetic Algorithms illustrate learning process population level first level learning Evolution Strategies demonstrate learning process metalevel strategy parameters second level learning
We examine novel addition known methods learning Bayesian networks data improves quality learned networks Our approach explicitly represents learns local structure conditional probability distributions CPDs quantify networks This increases space possible models enabling representation CPDs variable number parameters The resulting learning procedure induces models better emulate interactions present data We describe theoretical foundations practical aspects learning local structures provide empirical evaluation proposed learning procedure This evaluation indicates learning curves characterizing procedure converge faster number training instances standard procedure ignores local structure CPDs Our results also show networks learned local structures tend complex terms arcs yet require fewer parameters
This paper contains method bound test errors voting committees members chosen pool trained classifiers There many prospective committees validating directly achieve useful error bounds Because fewer classifiers prospective committees better validate classifiers individually use linear programming infer committee error bounds We test method using credit card data Also extend method infer bounds classifiers general
An accurate simulation heating coil used compare performance PI controller neural network trained predict steadystate output PI controller neural network trained minimize nstep ahead error coil output set point reinforcement learning agent trained minimize sum squared error time Although PI controller works well task neural networks result improved performance
The CN algorithm induces ordered list classification rules examples using entropy search heuristic In short paper describe two improvements algorithm Firstly present use Laplacian error estimate alternative evaluation function secondly show unordered well ordered rules generated We experimentally demonstrate significantly improved performances resulting changes thus enhancing usefulness CN inductive tool Comparisons Quinlans C also made
Neural computation also called connectionism parallel distributed processing neural network modeling brainstyle computation grown rapidly last decade Despite explosion ultimately impressive applications dire need concise introduction theoretical perspective analyzing strengths weaknesses connectionist approaches establishing links disciplines statistics control theory The Introduction Theory Neural Computation Hertz Krogh Palmer subsequently referred HKP written perspective physics home discipline authors The book fulfills mission introduction neural network novices provided background calculus linear algebra statistics It covers number models often viewed disjoint Critical analyses fruitful comparisons models
Controlflow misprediction penalties major impediment high performance wideissue superscalar processors In paper present Selective Eager Execution SEE execution model overcome misspeculation penalties executing paths diffident branches We present microarchitecture PolyPath processor extension aggressive superscalar outoforder architecture The PolyPath architecture uses novel instruction tagging register renaming mechanism execute instructions multiple paths simultaneously processor pipeline retaining maximum resource availability singlepath code sequences Results executiondriven pipelinelevel simulations show SEE improve performance much go benchmark average SPECint compared normal superscalar outoforder speculative execution monopath processor Moreover architectural model elegant practical implement using small amount additional state control logic
This paper describes competitive tree learning algorithm derived first principles The algorithm approximates Bayesian decision theoretic solution learning task Comparative experiments algorithm several mature AI statistical families tree learning algorithms currently use show derived Bayesian algorithm consistently good better although sometimes computational cost Using strategy design algorithms many supervised model learning tasks given probabilistic representation kind knowledge learned As illustration second learning algorithm derived learning Bayesian networks data Implications incremental learning use multiple models also discussed
We address problem finding subset features allows supervised induction algorithm induce small highaccuracy concepts We examine notions relevance irrelevance show definitions used machine learning literature adequately partition features useful categories relevance We present definitions irrelevance two degrees relevance These definitions improve understanding behavior previous subset selection algorithms help define subset features sought The features selected depend features target concept also induction algorithm We describe method feature subset selection using crossvalidation applicable induction algorithm discuss experiments conducted ID C artificial real datasets
We describe machine learning method predicting value realvalued function given values multiple input variables The method induces solutions samples form ordered disjunctive normal form DNF decision rules A central objective method representation induction compact easily interpretable solutions This rulebased decision model extended search efficiently similar cases prior approximating function values Experimental results realworld data demonstrate new techniques competitive existing machine learning statistical methods sometimes yield superior regression performance
This work presents hybrid branch predictor scheme uses limited form dual path execution along dynamic branch prediction improve execution times The ability execute paths conditional branch enables branch penalty minimized however relying exclusively dual path execution infeasible due instruction fetch rates far exceed capability pipeline retire single branch others must processed By using confidence information available dynamic branch prediction state tables limited form dual path execution becomes feasible This reduces burden branch predictor allowing predictions low confidence avoided In study present new approach gather branch prediction confidence little overhead use confidence mechanism determine whether dual path execution branch prediction used Comparing hybrid predictor model dynamic branch predictor shows dramatic decrease misprediction rate translates reduction runtime These results imply dual path execution often thought excessively resource consuming method may worthy approach restricted appropriate predicting set
This paper presents Threaded MultiPath Execution TME exploits existing hardware Simultaneous Multithreading SMT processor speculatively execute multiple paths execution When fewer threads SMT processor hardware contexts threaded multipath execution uses spare contexts fetch execute code along less likely path hardtopredict branches This paper describes hardware mechanisms needed enable SMT processor efficiently spawn speculative threads threaded multipath execution The Mapping Synchronization Bus described enables spawning multiple paths Policies examined deciding branches fork managing competition primary alternate path threads critical resources Our results show TME increases single program performance SMT eight thread contexts average depending misprediction penalty programs high misprediction rate
In paper review research machine learning relation computational models human learning We focus initially concept induction examining five main approaches problem consider complex issue learning sequential behaviors After compare rhetoric sometimes appears machine learning psychological literature growing evidence different theoretical paradigms typically produce similar results In response suggest concrete computational models currently dominate field may less useful simulations operate abstract level We illustrate point abstract simulation explains challenging phenomenon area category learning conclude general observations abstract models
The function unknown biological sequence often accurately inferred identifying sequences homologous original sequence Given query set known homologs exist least three general classes techniques finding additional homologs pairwise sequence comparisons motif analysis hidden Markov modeling Pair wise sequence comparisons typically employed single query sequence known Hidden Markov models HMMs hand usually trained sets sequences Motif based methods fall two extremes
Future research directions Knowledge Discovery Databases KDD include ability extract overlying concept relating useful data Current limitations involve search complexity find concept means useful The Pattern Theory research crosses natural way aforementioned domain The goal paper threefold First present new approach problem learning Discovery robust pattern finding Second explore current limitations Pattern Theoretic approach applied general KDD problem Third exhibit performance experimental results binary functions compare results C This new approach learning demonstrates powerful method finding patterns robust manner
Prerequisites An understanding dynamic programming edit distance approach pairwise sequence alignment useful parts Also familiarity use Internet resources would helpful part For former see Chapters latter see Chapter Hypertext Book GNAVSNS Biocomputing Course httpwwwtechfakunibielefelddebcdCurricwelcomehtml General Rationale You understand Multiple Alignment considered challenging problem study approaches try reduce number steps needed calculate optimal solution study fast heuristics In case study involving immunoglobulin sequences study multiple alignments obtained WWW servers recapitulating results original paper Revision History Version Sep Expanded Ex Updated Ex Revised Solution Sheet Ex Marked Exercises A submitted Instructor Various minor clarifications content
This article describes new system induction oblique decision trees This system OC combines deterministic hillclimbing two forms randomization find good oblique split form hyperplane node decision tree Oblique decision tree methods tuned especially domains attributes numeric although adapted symbolic mixed symbolicnumeric attributes We present extensive empirical studies using real artificial data analyze OCs ability construct oblique trees smaller accurate axisparallel counterparts We also examine benefits randomization construction oblique decision trees
ExplanationBased Reinforcement Learning EBRL introduced Dietterich Flann way combining ability Reinforcement Learning RL learn optimal plans generalization ability ExplanationBased Learning EBL Dietterich Flann We extend work domains agent must order achieve sequence subgoals optimal fashion Hierarchical EBRL effectively learn optimal policies sequential task domains even subgoals weakly interact We also show planner achieve individual subgoals available method converges even faster
Discretization continuously valued data useful necessary tool many learning paradigms assume nominal data A list objectives efficient effective discretization presented A paradigm called BRACE Boundary Ranking And Classification Evaluation attempts meet objectives presented along algorithm follows paradigm The paradigm meets many objectives potential extension meet remainder Empirical results promising For reasons BRACE potential effective efficient method discretization continuously valued data A advantage BRACE general enough extended types clusteringunsupervised learning
Naive Bayesian classifiers make independence assumptions perform remarkably well data sets poorly others We explore ways improve Bayesian classifier searching dependencies among attributes We propose evaluate two algorithms detecting dependencies among attributes show backward sequential elimination joining algorithm provides improvement naive Bayesian classifier The domains improvement occurs domains naive Bayesian classifier significantly less accurate decision tree learner This suggests attributes used common databases independent conditioned class violations independence assumption affect accuracy classifier The Bayesian classifier Duda Hart probabilistic method classification It used determine probability example j belongs class C given values attributes example represented set n nominallyvalued attributevalue pairs form A V j P A k V k j jC may estimated training data To determine likely class test example probability class computed Equation A classifier created manner sometimes called simple Langley naive Kononenko Bayesian classifier One important evaluation metric machine learning methods predictive accuracy unseen examples This measured randomly selecting subset examples database use training examples reserving remainder used test examples In case simple Bayesian classifier training examples used estimate probabilities Equation used detected training data
Multiple sequence alignment distantly related viral proteins remains challenge currently available alignment methods The hidden Markov model approach offers new flexible method generation multiple sequence alignments The results studies attempting infer appropriate parameter constraints generation de novo HMMs globin kinase aspartic acid protease ribonuclease H sequences SAM HMMER methods described
Several recurrent networks proposed representations task formal language learning After training recurrent network next step understand information processing carried network Some researchers Giles et al Watrous Kuhn Cleeremans et al resorted extracting finite state machines internal state trajectories recurrent networks This paper describes two conditions sensitivity initial conditions frivolous computational explanations due discrete measurements Kolen Pollack allow extraction methods return illusionary finite state descriptions
In order useful learning algorithm must able generalize well faced inputs previously presented system A bias necessary generalization shown several researchers recent years bias lead strictly better generalization summed possible functions applications This paper provides examples illustrate fact also explains bias learning algorithm better another practice probability occurrence functions taken account It shows domain knowledge understanding conditions learning algorithm performs well used increase probability accurate generalization identifies several conditions considered attempting select appropriate bias particular problem
In paper introduce modelbased reinforcement learning method called Hlearning optimizes undiscounted average reward We compare three reinforcement learning methods domain scheduling Automatic Guided Vehicles transportation robots used modern manufacturing plants facilities The four methods differ along two dimensions They either modelbased modelfree optimize discounted total reward undiscounted average reward Our experimental results indicate Hlearning robust respect changes domain parameters many cases converges fewer steps better average reward per time step methods An added advantage unlike methods parameters tune
This paper presents Converse Lyapunov Function Theorem motivated robust control analysis design Our result based upon generalizes various aspects wellknown classical theorems In unified natural manner allows arbitrary bounded timevarying parameters system description deals global asymptotic stability results smooth infinitely differentiable Lyapunov functions applies stability respect necessarily compact invariant sets Introduction This work motivated problems robust nonlinear stabilization One main
Technical Report AI May Abstract An important often neglected problem field Artificial Intelligence grounding systems environment representations manipulate inherent meaning system Since humans rely heavily semantics seems likely grounding crucial development truly intelligent behavior This study investigates use simulated robotic agents neural network processors part method ensure grounding Both topology weights neural networks optimized genetic algorithms Although comprehensive optimization difficult empirical evidence gathered shows method tractable quite fruitful In experiments agents evolved wallfollowing control strategy able transfer novel environments Their behavior suggests also learning build cognitive maps
ExplanationBased Learning Mitchell et al DeJong Mooney shown promise powerful analytical learning technique However EBL severely hampered requirement complete correct domain theory successful learning occur Clearly nontrivial domains developing domain theory nearly impossible task Therefore much research devoted understanding imperfect domain theory corrected extended system performance In paper present characterization problem use analyze past research area Past characterizations problem eg Mitchell et al Rajamoney DeJong viewed types performance errors caused faulty domain theory primary In contrast focus primarily types knowledge deficiencies present theory derive types performance errors result Correcting theory viewed search space possible domain theories variety knowledge sources used guide search We examine types knowledge used variety past systems purpose The hope analysis indicate need universal weak method domain theory correction different sources knowledge theory correction freely flexibly combined
We study task tnding maximal posteriori MAP instantiation Bayesian network variables given partial value assignment initial constraint This problem known NPhard concentrate stochastic approximation algorithm simulated annealing This stochastic algorithm realized sequential process set Bayesian network variables one variable allowed change time Consequently method become impractically slow number variables increases We present method mapping given Bayesian network massively parallel Bolztmann machine neural network architecture sense instead using normal sequential simulated annealing algorithm use massively parallel stochastic process Boltzmann machine architecture The neural network updating process provably converges state solves given MAP task
Parameterized heuristics offers elegant powerful theoretical framework design analysis autonomous adaptive communication networks Routing messages networks presents realtime instance multicriterion optimization problem dynamic uncertain environment This paper describes framework heuristic routing large networks The effectiveness heuristic routing mechanism upon Quo Vadis based described part simulation study within network grid topology A formal analysis underlying principles presented incremental design set heuristic decision functions used guide messages along nearoptimal eg minimum delay path large network This paper carefully derives properties heuristics set simplifying assumptions network topology load dynamics identify conditions guaranteed route messages along optimal path The paper concludes discussion relevance theoretical results presented paper design intelligent autonomous adaptive communication networks outline directions future research
Technical Report Department Statistics University Washington Derek Stanford Graduate Research Assistant Adrian E Raftery Professor Statistics Sociology Department Statistics University Washington Box Seattle WA USA Email stanfordstatwashingtonedu rafterystatwashingtonedu Web httpwwwstatwashingtoneduraftery This research supported ONR grants N N The authors grateful Simon Byers Gilles Celeux Christian Posse helpful discussions
We analyze algorithms predict binary value combining predictions several prediction strategies called experts Our analysis worstcase situations ie make assumptions way sequence bits predicted generated We measure performance algorithm difference expected number mistakes makes bit sequence expected number mistakes made best expert sequence expectation taken respect randomization predictions We show minimum achievable difference order square root number mistakes best expert give efficient algorithms achieve Our upper lower bounds matching leading constants cases We show leads certain kinds pattern recognitionlearning algorithms performance bounds improve best results currently known context We also compare analysis case log loss used instead expected number mistakes
This paper presents formalization novel approach structural similarity assessment adaptation casebased reasoning Cbr synthesis The approach informally presented exemplified implemented domain industrial building design Borner By relating approach existing theories provide foundation systematic evaluation appropriate usage Cases primary repository knowledge represented structurally using algebraic approach Similarity relations provide structure preserving case modifications modulo underlying algebra equational theory algebra available This representation modeled universe discourse enables theorybased inference adapted solutions The approach enables us incorporate formally generalization abstraction geometrical transformation combinations Cbr
A learning agent employing reinforcement learning hindered receives critics sparse weakly informative training information We present approach automated training agent may also provide occasional instruction learner form actions learner perform The learner access critics feedback trainers instruction In experiments vary level trainers interaction learner allowing trainer instruct learner almost every time step allowing trainer respond We also vary parameter controls learner incorporates trainers actions The results show significant reductions average number training trials necessary learn perform task
We present algorithm improving accuracy algorithms learning binary concepts The improvement achieved combining large number hypotheses generated training given learning algorithm different set examples Our algorithm based ideas presented Schapire paper The strength weak learnability represents improvement results The analysis algorithm provides general upper bounds resources required learning Valiants polynomial PAC learning framework best general upper bounds known today We show number hypotheses combined algorithm smallest number possible Other outcomes analysis results regarding representational power threshold circuits relation learnability compression method parallelizing PAC learning algorithms We provide extensions algorithms cases concepts binary case accuracy learning algorithm depends distribution instances
This paper proposes model ratio decidendi justification structure consisting series reasoning steps relate abstract predicates abstract predicates relate abstract predicates specific facts This model satisfies important set characteristics ratio decidendi identified jurisprudential literature In particular model shows theory case decided controls precedential effect By contrast purely exemplarbased model ratio decidendi fails account dependency precedential effect theory decision
In paper abstract computational principles underlying topographic maps discussed We give definition perfectly neighbourhood preserving map call topographic homeomorphism prove certain desirable properties It argued topographic homeomorphism exist usual case many equally valid choices available quantifying quality map We introduce particular measure encompasses several previous proposals discuss relation work This formulation problem sets within wellknown class quadratic assignment problems
This paper studies robustness pac learning algorithms instance space f g n examples corrupted purely random noise affecting instances labels In past conflicting results subject obtainedthe best agreement rule tolerate small amounts noise yet cases large amounts noise tolerated We show truth lies somewhere two alternatives For uniform attribute noise attribute flipped independently random probability present algorithm pac learns monomials unknown noise rate less Contrasting positive result show product random attribute noise attribute flipped randomly independently probability p nearly harmful malicious noiseno algorithm tolerate small amount noise fl Supported part GE Foundation Junior Faculty Grant NSF grant CCR Part research conducted author MIT Laboratory Computer Science supported NSF grant DCR grant Siemens Corporation Net address sgcswustledu
This paper describes research investigating behavioral specialization learning robot teams Each agent provided common set skills motor schemabased behavioral assemblages builds taskachieving strategy using reinforcement learning The agents learn individually activate particular behavioral assemblages given current situation reward signal The experiments conducted robot soccer simulations evaluate agents terms performance policy convergence behavioral diversity The results show many cases robots automatically diversify choosing heterogeneous behaviors The degree diversification performance team depend reward structure When entire team jointly rewarded penalized global reinforcement teams tend towards heterogeneous behavior When agents provided feedback individually local reinforcement converge identical policies
Product units provide method automatically learning higherorder input combinations required efficient synthesis Boolean logic functions neural networks Product units also higher information capacity sigmoidal networks However activation function received much attention literature A possible reason one encounters problems using standard backpropagation train networks containing units This report examines problems evaluates performance three training algorithms networks type Empirical results indicate error surface networks containing product units local minima corresponding networks summation units For reason combination local global training algorithms found provide reliable convergence We investigate hints added training algorithm By extracting common frequency input weights training frequency separately show convergence accelerated In order compare performance transfer functions product units implemented candidate units Cascade Correlation CC system Using candidate units resulted smaller networks trained faster standard three sigmoidal types one Gaussian transfer functions used This superiority confirmed pool candidate units four different nonlinear activation functions used compete addition network Extensive simulations showed problem implementing random Boolean logic functions product units always chosen transfer functions
Our goal develop cognitive model humans acquire skills complex cognitive tasks We pursuing goal designing computational architectures NRL Navigation task requires competent sensorimotor coordination In paper analyze NRL Navigation task depth We use data experiments human subjects learning task guide us constructing cognitive model skill acquisition task Verbal protocol data augments black box view provided execution traces inputs outputs Computational experiments allow us explore space alternative architectures task guided quality fit human performance data
We show paper AGM postulates week ensure rational preservation conditional beliefs belief revision thus permitting improper responses sequences observations We remedy weakness proposing four additional postulates sound relative qualitative version probabilistic conditioning Contrary AGM framework proposed postulates characterize belief revision process may depend elements epistemic state necessarily captured belief set We also show simple modification AGM framework allow belief revision function epistemic states We establish modelbased representation theorem characterizes proposed postulates constrains turn way entrenchment orderings may transformed iterated belief revision
Results presented demonstrate learning finetuning search strategies using connectionist mechanisms Previous studies strategy learning within symbolic productionrule formalism addressed finetuning behavior Here twolayer connectionist system presented develops search weak taskspecific strategy finetunes performance The system applied simulated realtime balancecontrol task We compare performance onelayer twolayer networks showing ability twolayer network discover new features thus enhance original representation critical solving balancing task
Following terminology used adaptive control distinguish indirect learning methods learn explicit models dynamic structure system controlled direct learning methods We compare existing indirect method uses conventional dynamic programming algorithm closely related direct reinforcement learning method applying methods infinite horizon Markov decision problem unknown statetransition probabilities The simulations show although direct method requires much less space dramatically less computation per control action learning ability task superior compares favorably complex indirect method Although results address methods performances compare problems become difficult suggest given fixed amount computational power available per control action may better use direct reinforcement learning method augmented indirect techniques devote available resources computationally costly indirect method Comprehensive answers questions raised study depend many factors making eco nomic context computation
We propose general framework study belief change We begin defining belief terms knowledge plausibility agent believes knows true worlds considers plausible We consider properties defining interaction knowledge plausibility show properties affect properties belief In particular show assuming two natural properties belief becomes KD operator Finally add time picture This gives us framework talk knowledge plausibility hence belief time extends framework Halpern Fagin HF modeling knowledge multiagent systems We show framework quite expressive lets us model natural way number different scenarios belief change For example show capture analogue prior probabilities updated conditioning In related paper show two best studied scenarios belief revision belief update fit framework
Markov chain Monte Carlo MCMC used evaluating expectations functions interest target distribution This done calculating averages sample path Markov chain stationary distribution For computational efficiency Markov chain rapidly mixing This sometimes achieved careful design transition kernel chain basis detailed preliminary exploratory analysis An alternative approach might allow transition kernel adapt whenever new features encountered MCMC run However adaptation occurs infinitely often stationary distribution chain may disturbed We describe framework based concept Markov chain regeneration allows adaptation occur infinitely often disturb stationary distribution chain consistency samplepath averages Key Words Adaptive method Bayesian inference Gibbs sampling Markov chain Monte Carlo
A traditional interpolation model characterized choice regularizer applied interpolant choice noise model Typically regularizer single regularization constant ff noise model single parameter fi The ratio fffi alone responsible determining globally attributes interpolant complexity flexibility smoothness characteristic scale length characteristic amplitude We suggest interpolation models able capture one flavour simplicity complexity We describe Bayesian models interpolant smoothness varies spatially We emphasize importance practical implementation concept conditional convexity designing models many hyperparameters We apply new models interpolation neuronal spike data demonstrate substantial improvement generalization error
Author paper coordinator Machine Learning project StatLog This project supported financially European Community The main aim StatLog evaluate different learning algorithms using real industrial commercial applications As industrial partner contributor DaimlerBenz introduced different applications StatLog among fault diagnosis letter digit recognition creditscoring prediction number registered trucks We learned lot lessons project effected application oriented research field Machine Learning ML DaimlerBenz We distinguished especially research necessary prepare MLalgorithms handle real industrial commercial applications In paper describe shortly DaimlerBenz applications StatLog discuss shortcomings applied MLalgorithms finally outline fields think research necessary
This paper describes application reinforcement learning RL difficult real world problem elevator dispatching The elevator domain poses combination challenges seen RL research date Elevator systems operate continuous state spaces continuous time discrete event dynamic systems Their states fully observable nonstationary due changing passenger arrival rates In addition use team RL agents responsible controlling one elevator car The team receives global reinforcement signal appears noisy agent due effects actions agents random nature arrivals incomplete observation state In spite complications show results simulation surpass best heuristic elevator control algorithms aware These results demonstrate power RL large scale stochastic dynamic optimization problem practical utility
This paper presents Fringe Exploration technique efficient exploration partially observable domains The key idea applicable many exploration techniques keep statistics space possible shortterm memories instead agents current state space Experimental results partially observable maze difficult driving task visual routines show dramatic performance improvements
Performing policy iteration dynamic programming require knowledge relative rather absolute measures utility actions Baird calls advantages actions states Nevertheless existing methods dynamic programming including Bairds compute form absolute utility function For smooth problems advantages satisfy two differential consistency conditions including requirement free curl show enforcing lead appropriate policy improvement solely terms advantages
We introduce parallel approach DTSelect selecting features used inductive learning algorithms predict protein secondary structure DTSelect able rapidly choose small nonredundant feature sets pools containing hundreds thousands potentially useful features It building decision tree using features pool classifies set training examples The features included tree provide compact description training data thus suitable use inputs inductive learning algorithms Empirical experiments protein secondarystructure task sets complex features chosen DTSelect used augment standard artificial neural network representation yield surprisingly little performance gain even though features selected large feature pools We discuss possible reasons result
This paper presents extension package developed author Faculty Sciences Technology New University Lisbon designed experimentation CoarseGrained Distributed Genetic Algorithms DGA The package implemented extension Basic Sugal system developed Andrew Hunter University Sunderland UK primarily intended used research Sequential Serial Genetic Algorithms SGA
We explore representation D objects several distinct D views stored object We demonstrate ability twolayer network thresholded summation units support representations Using unsupervised Hebbian relaxation network learned recognize ten objects different viewpoints The training process led emergence compact representations specific input views When tested novel views objects network exhibited substantial generalization capability In simulated psychophysical experiments networks behavior qualitatively similar human subjects
Internal models environment important role play adaptive systems general particular importance supervised learning paradigm In paper demonstrate certain classical problems associated notion teacher supervised learning solved judicious use learned internal models components adaptive system In particular show supervised learning algorithms utilized cases unknown dynamical system intervenes actions desired outcomes Our approach applies supervised learning algorithm capable learning multilayer networks This paper revised version MIT Center Cognitive Science Occasional Paper We wish thank Michael Mozer Andrew Barto Robert Jacobs Eric Loeb James McClelland helpful comments manuscript This project supported part BRSG S RR awarded Biomedical Research Support Grant Program Division Research Resources National Institutes Health grant ATR Auditory Visual Perception Research Laboratories grant Siemens Corporation grant Human Frontier Science Program grant NJ awarded Office Naval Research
Technical Report February updated April This paper appear Proceedings Eleventh International Conference Machine Learning Abstract This paper presents algorithm incremental induction decision trees able handle numeric symbolic variables In order handle numeric variables new tree revision operator called slewing introduced Finally nonincremental method given finding decision tree based direct metric candidate tree
fl This research primarily conducted author University Calif Santa Cruz support ONR grant NK Harvard University supported ONR grant NK DARPA grant AFOSR Current address NEC Research Institute Independence Way Princeton NJ Email address nicklresearchnjneccom Supported ONR grants NK NJ Part research done author sabbatical Aiken Computation Laboratory Harvard partial support ONR grants NK NK Address Department Computer Science University California Santa Cruz Email address manfredcsucscedu
Many recent approaches avoiding utility problem speedup learning rely sophisticated utility measures significant numbers training data accurately estimate utility control knowledge Empirical results presented elsewhere indicate simple selection strategy retaining control rules derived training problem explanation quickly defines efficient set control knowledge training problems This simple selection strategy provides lowcost alternative exampleintensive approaches improving speed problem solver
Partigame new algorithm learning feasible trajectories goal regions high dimensional continuous statespaces In high dimensions essential learning plan uniformly statespace Partigame maintains decisiontree partitioning statespace applies techniques gametheory computational geometry efficiently adaptively concentrate high resolution critical areas The current version algorithm designed find feasible paths trajectories goal regions high dimensional spaces Future versions designed find solution optimizes realvalued criterion Many simulated problems tested ranging twodimensional ninedimensional statespaces including mazes path planning nonlinear dynamics planar snake robots restricted spaces In cases good solution found less ten trials minutes
Predictive inference seen process determining predictive distribution discrete variable given data set training examples values problem domain variables We consider three approaches computing predictive distribution assume joint probability distribution variables belongs set distributions determined set parametric models In simplest case predictive distribution computed using model maximum posteriori MAP posterior probability In evidence approach predictive distribution obtained averaging individual models model family In third case define predictive distribution using Rissanens new definition stochastic complexity Our experiments performed family Naive Bayes models suggest using data available stochastic complexity approach produces accurate predictions logscore sense However amount available training data decreased evidence approach clearly outperforms two approaches The MAP predictive distribution clearly inferior logscore sense two sophisticated approaches score MAP approach may still cases produce best results
Given problem casebased reasoning CBR system search case memory use stored cases find solution possibly modifying retrieved cases adapt required input specifications In paper introduce neural network architecture efficient casebased reasoning We show rigorous Bayesian probability propagation algorithm implemented feedforward neural network adapted CBR In approach efficient indexing problem CBR naturally implemented parallel architecture heuristic matching replaced probability metric This allows CBR perform theoretically sound Bayesian reasoning We also show probability propagation actually offers solution adaptation problem natural way
We present new generalpurpose algorithm learning classes valued functions generalization prediction model prove general upper bound expected absolute error algorithm terms scalesensitive generalization Vapnik dimension proposed Alon BenDavid CesaBianchi Haussler We give lower bounds implying upper bounds improved constant factor general We apply result together techniques due Haussler Benedek Itai obtain new upper bounds packing numbers terms scalesensitive notion dimension Using different technique obtain new bounds packing numbers terms Kearns Schapires fatshattering function We show apply packing bounds obtain improved general bounds sample complexity agnostic learning For gt establish weaker sufficient stronger necessary conditions class valued functions agnostically learnable within uniform GlivenkoCantelli class
It widely considered ultimate connectionist objective incorporate neural networks intelligent systems These systems intended possess varied repertoire functions enabling adaptable interaction nonstatic environment The first step direction develop various neural network algorithms models second step combine networks modular structure might incorporated workable system In paper consider one aspect second point namely processing reliability hiding wetware details Pre sented architecture type neural expert module named Authority An Authority consists number Minos modules Each Minos modules Authority processing capabilities varies respect particular specialization aspects problem domain The Authority employs collection Minoses like panel experts The expert highest confidence believed answer confidence quotient transmitted levels system hierarchy
Partially observable Markov decision processes pomdps model decision problems agent tries maximize reward face limited andor noisy sensor feedback While study pomdps motivated need address realistic problems existing techniques finding optimal behavior appear scale well unable find satisfactory policies problems dozen states After brief review pomdps paper discusses several simple solution methods shows capable finding nearoptimal policies selection extremely small pomdps taken learning literature In contrast show none able solve slightly larger noisier problem based robot navigation We find combination two novel approaches performs well problems suggest methods scaling even larger complicated domains
We propose new method construction Markov chains given stationary distribution This method based construction auxiliary chain stationary distribution picking elements auxiliary chain suitable number times The proposed method many advantages rivals It easy implement provides simple analysis faster efficient currently available techniques also adapted course simulation We make theoretical numerical comparisons characteristics proposed algorithm MCMC techniques
The problem making optimal decisions uncertain conditions central Artificial Intelligence If state world known times world modeled Markov Decision Process MDP MDPs studied extensively many methods known determining optimal courses action policies The realistic case state information partially observable Partially Observable Markov Decision Processes POMDPs received much less attention The best exact algorithms problems inefficient space time We introduce Smooth Partially Observable Value Approximation SPOVA new approximation method quickly yield good approximations improve time This method combined reinforcement learning methods combination effective test cases
The Katsuno Mendelzon KM theory belief update proposed reasonable model revising beliefs changing world However semantics update relies information readily available We describe alternative semantical view update observations incorporated belief set explaining observation terms set plausible events might caused observation b predicting consequences explanations We also allow possibility conditional explanations We show picture naturally induces update operator conforming KM postulates certain assumptions However argue assumptions always reasonable restrict ability integrate update forms revision reasoning action fl Some parts report appeared preliminary form An EventBased Abductive Model Update Proc Tenth Canadian Conf AI Banff Alta
This paper specifies main features Brainlike Neuronal Connectionist models argues need usefulness appropriate successively larger brainlike structures examines parallelhierarchical Recognition Cone models perception perspective examples structures The anatomy physiology behavior development visual system briefly summarized motivate architecture brainstructured networks perceptual recognition Results presented simulations carefully predesigned Recognition Cone structures perceive objects eg houses digitized photographs A framework perceptual learning introduced including mechanisms generationdiscovery feedbackguided growth new links nodes subject brainlike constraints eg local receptive fields global convergencedivergence The information processing transforms discovered generation finetuned feedbackguided reweighting links Some preliminary results presented brainstructured networks learn recognize simple objects eg letters alphabet cups apples bananas feedbackguided generation reweighting These show large improvements networks either lack brainlike structure orand learn reweighting links alone
The ability restructure decision tree efficiently enables variety approaches decision tree induction would otherwise prohibitively expensive Two approaches described one incremental tree induction ITI nonincremental tree induction using measure tree quality instead test quality DMTI These approaches several variants offer new computational classifier characteristics lend particular applications
We consider logistic regression model Gaussian prior distribution parameters We show accurate variational techniques used obtain closed form posterior distribution parameters given data thereby yielding posterior predictive model The results readily extended binary belief networks For belief networks also derive closed form posteriors presence missing values Finally show dual regression problem gives latent variable density model variational formulation leads exactly solvable EM updates
Mean field methods provide computationally efficient approximations posterior probability distributions graphical models Simple mean field methods make completely factorized approximation posterior unlikely accurate posterior multimodal Indeed posterior multimodal one modes captured To improve mean field approximation cases employ mixture models posterior approximations mixture component factorized distribution We describe efficient methods optimizing parameters models
The success evolutionary methods standard control learning tasks created need new benchmarks The classic pole balancing problem longer difficult enough serve viable yardstick measuring learning efficiency systems In paper present difficult version classic problem cart pole move plane We demonstrate neuroevolution system Enforced SubPopulations ESP solve difficult problem without velocity information
This paper introduces explores representational biases efficient learning spatial temporal spatiotemporal patterns connectionist networks CN massively parallel networks simple computing elements It examines learning mechanisms constructively build network structures encode information environmental stimuli successively higher resolutions needed tasks eg perceptual recognition network perform Some simple examples presented illustrate basic structures processes used networks ensure parsimony learned representations guiding system focus efforts minimal adequate resolution Several extensions basic algorithm efficient learning using multiresolution representations spatial temporal spatiotemporal patterns discussed
Qlearning uses TDmethods accelerate Qlearning The update complexity previous online Q implementations based lookuptables bounded size stateaction space Our faster algorithms update complexity bounded number actions The method based observation Qvalue updates may postponed needed
Massively parallel networks relatively simple computing elements offer attractive versatile framework exploring variety learning structures processes intelligent systems This paper briefly summarizes popular learning structures processes used networks It outlines range potentially powerful alternatives patterndirected inductive learning systems It motivates develops class new learning algorithms massively parallel networks simple computing elements We call class learning processes generative offer set mechanisms constructive adaptive determination network architecture number processing elements connectivity among function experience Generative learning algorithms attempt overcome limitations approaches learning networks rely modification weights links within otherwise fixed network topology eg rather slow learning need apriori choice network architecture Several alternative designs well range control structures processes used regulate form content internal representations learned networks examined Empirical results study generative learning algorithms briefly summarized several extensions refinements algorithms directions future research outlined
The use artificial neural networks domain autonomous vehicle navigation produced promising results ALVINN Pomerleau shown neural system drive vehicle reliably safely many different types roads ranging paved paths interstate highways Even impressive results several areas within neural paradigm autonomous road following still need addressed These include transparent navigation roads different type simultaneous use different sensors generalization road types neural system never seen The system presented addresses issue modular neural architecture uses pretrained ALVINN networks connectionist superstructure robustly drive many dif ferent types roads
We introduce constructive incremental learning system regression problems models data means locally linear experts In contrast approaches experts trained independently compete data learning Only prediction query required experts cooperate blending individual predictions Each expert trained minimizing penalized local cross val dation error using second order methods In way expert able find local distance metric adjusting size shape rece p tive field predictions valid also detect relevant n put features adjusting bias importance individual input dimensions We derive asymptotic results method In variety simulations properties algorithm demonstrated respect interference learning speed prediction accuracy feature detection task oriented incremental learning
The model nonBayesian agent faces repeated game incomplete information Nature appropriate tool modeling general agentenvironment interactions In model environment state controlled Nature may change arbitrarily feedbackreward function initially unknown The agent Bayesian form prior probability neither state selection strategy Nature reward function A policy agent function assigns action every history observations actions Two basic feedback structures considered In one perfect monitoring case agent able observe previous environment state part feedback imperfect monitoring case available agent reward obtained Both settings refer partially observable processes current environment state unknown Our main result refers competitive ratio criterion perfect monitoring case We prove existence efficient stochastic policy ensures competitive ratio obtained almost stages arbitrarily high probability efficiency measured terms rate convergence It shown optimal policy exist imperfect monitoring case Moreover proved perfect monitoring case exist deterministic policy satisfies long run optimality criterion In addition discuss maxmin criterion prove deterministic efficient optimal strategy exist imperfect monitoring case criterion Finally show approach longrun optimality viewed qualitative distinguishes previous work area
We describe polynomialtime algorithm learning axisaligned rectangles Q respect product distributions multipleinstance examples PAC model Here example consists n elements Q together label indicating whether n points rectangle learned We assume unknown product distribution D Q instances independently drawn according D The accuracy hypothesis measured probability would incorrectly predict whether one n points drawn D rectangle learned Our algorithm achieves accuracy probability ffi
We present new machine learning method given set training examples induces definition target concept terms hierarchy intermediate concepts definitions This effectively decomposes problem smaller less complex problems The method inspired Boolean function decomposition approach design digital circuits To cope high time complexity finding optimal decomposition propose suboptimal heuristic algorithm The method implemented program HINT HIerarchy Induction Tool experimentally evaluated using set artificial realworld learning problems It shown method performs well terms classification accuracy discovery meaningful concept hierarchies
In context inductive learning Bayesian approach turned successful estimating probabilities events learning examples The mprobability estimate developed handle situations In paper present mdistribution estimate extension mprobability estimate besides estimation probabilities covers also estimation probability distributions We focus application construction regression trees The theoretical results incorporated system automatic induction regression trees The results applying upgraded system several domains presented compared previous results
Realworld learning tasks often involve highdimensional data sets complex patterns missing features In paper review problem learning incomplete data two statistical perspectivesthe likelihoodbased Bayesian The goal twofold place current neural network approaches missing data within statistical framework describe set algorithms derived likelihoodbased framework handle clustering classification function approximation incomplete data principled efficient manner These algorithms based mixture modeling make two distinct appeals ExpectationMaximization EM principle Dempster et al estimation mixture components coping missing data This report describes research done Center Biological Computational Learning Artificial Intelligence Laboratory Massachusetts Institute Technology Support Center provided part grant National Science Foundation contract ASC Support laboratorys artificial intelligence research provided part Advanced Research Projects Agency Department Defense The authors supported part grant ATR Auditory Visual Perception Research Laboratories grant Siemens Corporation grant IRI National Science Foundation grant NJ Office Naval Research Zoubin Ghahramani supported grant McDonnellPew Foundation Michael I Jordan NSF Presidential Young Investigator
Recently proven dynamics deterministic finitestate automata DFA n states input symbols implemented sparse secondorder recurrent neural network SORNN n state neurons Omn secondorder weights sigmoidal discriminant functions We investigate constructive algorithm extended faulttolerant neural DFA implementations faults analog implementation neurons weights affect desired network performance We show tolerance weight perturbation achieved easily tolerance weight andor neuron stuckatzero faults however requires duplication network resources This result impact construction neural DFAs dense internal representation DFA states
Technical Report No Department Statistics University Washington October Abhijit Dasgupta graduate student Department Biostatistics University Washington Box Seattle WA email address dasguptabiostatwashingtonedu Adrian E Raftery Professor Statistics Sociology Department Statistics University Washington Box Seattle WA email address rafterystatwashingtonedu This research supported Office Naval Research Grant NJ The authors grateful Peter Guttorp Girardeau Henderson Robert Muise helpful discussions
In multiarmed bandit problem gambler must decide arm K nonidentical slot machines play sequence trials maximize reward This classical problem received much attention simple model provides tradeoff exploration trying arm find best one exploitation playing arm believed give best payoff Past solutions bandit problem almost always relied assumptions statistics slot machines In work make statistical assumptions whatsoever nature process generating payoffs slot machines We give solution bandit problem adversary rather wellbehaved stochastic process complete control payoffs In sequence T plays prove expected perround payoff algorithm approaches best arm rate OT give improved rate convergence best arm fairly low payoff We also prove general matching lower bound best possible performance algorithm setting In addition consider setting player team experts advising arm play give strategy guarantee expected payoff close best expert Finally apply result problem learning play unknown repeated matrix game allpowerful adversary
We show alternative way representing Bayesian belief network sensitivities probability distributions This representation equivalent traditional representation conditional probabilities makes dependencies nodes apparent intuitively easy understand We also propose QR matrix representation sensitivities andor conditional probabilities efficient memory requirements computational speed traditional representation computerbased implementations probabilistic inference We use sensitivities show certain class binary networks computation time approximate probabilistic inference positive upper bound error result independent size network Finally alternative traditional algorithms use conditional probabilities describe exact algorithm probabilistic inference uses QRrepresentation sensitivities updates probability distributions nodes network according messages neigh bors
The requirement train large neural networks quickly prompted design new massively parallel supercomputer using custom VLSI This design features processing nodes communicating mesh network connected directly processor chip Studies show peak performance range billion arithmetic operations per second This paper presents case custom hardware combines neural networkspecific features general programmable machine architecture briefly describes design progress
In many realworld domains like text categorization supervised learning requires large number training examples In paper describe active learning method uses committee learners reduce number training examples required learning Our approach similar Query Committee framework disagreement among committee members predicted label input part example used signal need knowing actual value label Our experiments text categorization using committee Winnowbased learners demonstrate approach reduce number labeled training examples required used single Winnow learner orders magnitude This paper review accepted publication another conference journal Acknowledgements The availability Reuters corpus Reuters STAT Data Manipulation Analysis Programs Perlman greatly assisted research date
covering formalized used extensively In work divideandconquer technique formalized well compared covering technique logic programming framework Covering works repeatedly specializing overly general hypothesis iteration focusing finding clause high coverage positive examples Divideandconquer works specializing overly general hypothesis focusing discriminating positive negative examples Experimental results presented demonstrating cases accurate hypotheses found divideandconquer covering Moreover since covering considers alternatives repeatedly tends less efficient divideandconquer never considers alternative twice On hand covering searches larger hypothesis space may result compact hypotheses found technique divideandconquer Furthermore divideandconquer contrast covering applicable learn ing recursive definitions
This paper introduces Recurrence Surface Approximation inductive learning method based linear programming predicts recurrence times using censored training examples examples available training output may lower bound right answer This approach augmented feature selection method chooses appropriate feature set within context linear programming generalizer Computational results field breast cancer prognosis shown A straightforward translation prediction method artificial neural network model also proposed
Minimum Message Length MML invariant Bayesian point estimation technique also consistent efficient We provide brief overview MML inductive inference Wallace Boulton Wallace Freeman informationtheoretic Bayesian interpretation We outline MML used statistical parameter estimation MML mixture modelling program Snob Wallace Boulton Wallace Wallace Dowe uses message lengths various parameter estimates enable combine parameter estimation selection number components The message length within constant logarithm posterior probability theory So MML theory also regarded theory highest posterior probability Snob currently assumes variables uncorrelated permits multivariate data Gaussian discrete multistate Poisson von Mises circular distributions
One challenges models cognitive phenomena development efficient exible interfaces low level sensory information high level processes For visual processing researchers long argued attentional mechanism required perform many tasks required high level vision This thesis presents VISIT connectionist model covert visual attention used vehicle studying interface The model efficient exible biologically plausible The complexity network linear number pixels Effective parallel strategies used minimize number iterations required The resulting system able efficiently solve two tasks particularly difficult standard bottomup models vision computing spatial relations visual search Simulations show networks behavior matches much known psychophysical data human visual attention The general architecture model also closely matches known physiological data human attention system Various extensions VISIT discussed including methods learning component modules
A Lyapunov function excitatoryinhibitory networks constructed The construction assumes symmetric interactions within excitatory inhibitory populations neurons antisymmetric interactions populations The Lyapunov function yields sufficient conditions global asymptotic stability fixed points If conditions violated limit cycles may stable The relations Lyapunov function optimization theory classical mechanics revealed The dynamics neural network symmetric interactions provably converges fixed points general assumptions This mathematical result helped establish paradigm neural computation fixed point attractors But reality interactions neurons brain asymmetric Furthermore dynamical behaviors seen brain confined fixed point attractors also include oscillations complex nonperiodic behavior These types dynamics realized asymmetric networks may useful neural computation For reasons important understand global behavior asymmetric neural networks The interaction excitatory neuron inhibitory neuron clearly asymmetric Here consider class networks incorporates fundamental asymmetry brains microcircuitry Networks class distinct populations excitatory inhibitory neurons antisymmetric interactions minimax dissipative Hamiltonian forms network dynamics
In Bayesian inference Bayes factor defined ratio posterior odds versus prior odds posterior odds simply ratio normalizing constants two posterior densities In many practical problems two posteriors different dimensions For cases current Monte Carlo methods bridge sampling method Meng Wong path sampling method Gelman Meng ratio importance sampling method Chen Shao directly applied In article extend importance sampling bridge sampling ratio importance sampling problems different dimensions Then find global optimal importance sampling bridge sampling ratio importance sampling sense minimizing asymptotic relative meansquare errors estimators Implementation algorithms asymptotically achieve optimal simulation errors developed two illustrative examples also provided
As knowledge bases used AI systems increase size access relevant information dominant factor cost inference This especially true analogical casebased reasoning ability system perform inference dependent efficient flexible access large base exemplars cases judged likely relevant solving problem hand In chapter discuss novel algorithm efficient associative matching relational structures large semantic networks The structure matching algorithm uses massively parallel hardware search memory knowledge structures matching given probe structure The algorithm built top PARKA massively parallel knowledge representation system runs Connection Machine We currently exploring utility algorithm CaPER casebased planning system
We consider use online stopping rules reduce number training examples needed paclearn Rather collect large training sample proved sufficient eliminate bad hypotheses priori idea instead observe training examples oneatatime decide online whether stop return hypothesis continue training The primary benefit approach detect hypothesizer actually converged halt training standard fixedsamplesize bounds This paper presents series sequential learning procedures distributionfree paclearning mistakebounded pac conversion distributionspecific paclearning respectively We analyze worst case expected training sample size procedures show often smaller existing fixed sample size bounds still providing exact worst case pacguarantees We also provide lower bounds show reductions best involve constant possibly log factors However empirical studies show sequential learning procedures actually use many times fewer training examples practice
This paper presents novel approach determine structural similarity guidance adaptation casebased reasoning Cbr We advance structural similarity assessment provides single numeric value specific structure two cases common inclusive modification rules needed obtain structure two cases Our approach treats retrieval matching adaptation group dependent processes This guarantees retrieval matching similar adaptable cases Both together enlarge overall problem solving performance Cbr explainability case selection adaptation considerably Although approach theoretical nature restricted specific domain give example taken domain industrial building design Additionally sketch two prototypical implementations approach
We analyze blameassignment task context experiencebased design redesign physical devices We identify three types blameassignment tasks differ types information take input design achieve desired behavior device design results undesirable behavior specific structural element design misbehaves We describe modelbased approach solving blameassignment task This approach uses structurebehaviorfunction models capture designers comprehension way device works terms causal explanations structure results behaviors We also address issue indexing models memory We discuss three types blameassignment tasks require different types indices accessing models Finally describe KRITIK system implements evaluates modelbased approach blame assignment
We present framework taskdriven knowledge acquisition development design support systems Different types knowledge enter knowledge base design support system defined illustrated formal knowledge acquisition vantage point Special emphasis placed taskstructure used guide acquisition application knowledge Starting knowledge planning steps design augmenting problemsolving knowledge supports design formal integrated model knowledge design constructed Based notion knowledge acquisition incremental process give account possibilities problem solving depending knowledge disposal system Finally depict different kinds knowledge interact design support system This research supported German Ministry Research Technology BMFT within joint project FABEL contract IW Project partners FABEL German National Research Center Computer Science GMD Sankt Augustin BSR Consulting GmbH Munchen Technical University Dresden HTWK Leipzig University Freiburg University Karlsruhe
The activity sorting like objects classes without help omniscient supervisor known unsupervised classification In AI symbolic connectionist camps study classification The statistical classifiers Autoclass Snob search theory best explain distribution given data whereas neural network classifiers Kohonens networks ART use vector quantization principle classifying data Previously many studies compared supervised classification algorithms challenging problem comparing unsupervised classifiers largely ignored We performed empirical comparison ART Autoclass Snob We highlight strengths weaknesses various classifiers Overall statistical classifiers especially Snob perform better neural network counterpart ART
AI research casebased reasoning led development many laboratory casebased systems As move towards introducing systems work environments explaining processes casebased reasoning becoming increasingly important issue In paper describe notion metacase illustrating explaining justifying casebased reasoning A metacase contains trace processing problemsolving episode provides explanation problemsolving decisions partial justification solution The language representing problemsolving trace depends model problem solving We describe taskmethod knowledge TMK model problemsolving describe representation metacases TMK language We illustrate explanatory scheme examples Interactive Kritik computerbased de
Statistical decision theory provides principled way estimate amino acid frequencies conserved positions protein family The goal minimize risk function expected squarederror distance estimates true population frequencies The minimumrisk estimates obtained adding optimal number pseudocounts observed data Two formulas presented one pseudocounts based marginal amino acid frequencies one pseudocounts based observed data Experimental results show profiles constructed using minimalrisk estimates discriminating constructed using existing methods
The purpose paper propose refinement notion innateness If merely identify innateness bias obtain poor characterisation notion since learning device relies bias makes choose given hypothesis instead another We show intuition innateness better captured characteristic bias related isotropy Generalist models learning shown rely isotropic bias whereas bias specialised models include specific priori knowledge learned necessarily anisotropic The socalled generalist models however turn specialised way learn symmetrical forms preferentially strictly deficiencies learning ability Because learning beings always show two properties generalist models may sometimes ruled bad candidates cognitive modelling
Reliable visionbased control autonomous vehicle requires ability focus attention important features input scene Previous work autonomous lane following system ALVINN Pomerleau yielded good results uncluttered conditions This paper presents artificial neural network based learning approach handling difficult scenes confuse ALVINN system This work presents mechanism achieving taskspecific focus attention exploiting temporal coherence A saliency map based upon computed expectation contents inputs next time step indicates regions input retina important performing task The saliency map used accentuate features important task deemphasize
Production scheduling problem sequentially configuring factory meet forecasted demands critical problem throughout manufacturing industry The requirement maintaining product inventories face unpredictable demand stochastic factory output makes standard scheduling models jobshop inadequate Currently applied algorithms simulated annealing constraint propagation must employ adhoc methods frequent replanning cope uncertainty In paper describe Markov Decision Process MDP formulation production scheduling captures stochasticity production demands The solution MDP value function used generate optimal scheduling decisions online A simple example illustrates theoretical superiority approach replanningbased methods We describe industrial application two reinforcement learning methods generating approximate value function domain Our results demonstrate deterministic noisy scenarios value function approximation effective technique
In paper investigate new formal model machine learning concept boolean function learned may exhibit uncertain probabilistic behaviorthus input may sometimes classified positive example sometimes negative example Such probabilistic concepts pconcepts may arise situations weather prediction measured variables accuracy insufficient determine outcome certainty We adopt Valiant model learning demands learning algorithms efficient general sense perform well wide class pconcepts distribution domain In addition giving many efficient algorithms learning natural classes pconcepts study develop detail underlying theory learning pconcepts
This paper highlights phenomenon causes deductively learned knowledge harmful used problem solving The problem occurs deductive problem solvers encounter failure branch search tree The backtracking mechanism problem solvers force program traverse whole subtree thus visiting many nodes twice using deductively learned rule using rules generated learned rule first place We suggest approach called utilization filtering solve problem Learners use approach submit problem solver filter function together knowledge acquired The function decides problem whether use learned knowledge part use We tested idea context lemma learning system filter uses probability subgoal failing decide whether turn lemma usage Experiments show improvement performance factor This paper concerned particular type harmful redundancy occurs deductive problem solvers employ backtracking search procedure use deductively learned knowledge accelerate search The problem failure branches search tree backtracking mechanism problem solver forces exploration whole subtree Thus search procedure visit many states twice using deductively learned rule using search path produced rule first place
fl The authors thank Rich Yee Vijay Gullapalli Brian Pinette Jonathan Bachrach helping clarify relationships heuristic search control We thank Rich Sutton Chris Watkins Paul Werbos Ron Williams sharing fundamental insights subject numerous discussions thank Rich Sutton first making us aware Korfs research thoughtful comments manuscript We grateful Dimitri Bertsekas Steven Sullivan independently pointing error earlier version article Finally thank Harry Klopf whose insight persistence encouraged interest class learning problems This research supported grants AG Barto National Science Foundation ECS ECS Air Force Office Scientific Research Bolling AFB AFOSR
Technical Report OSUCISRC TR Abstract One classical topics neural networks winnertakeall WTA widely used unsupervised competitive learning cortical processing attentional control Because global connectivity WTA networks however encode spatial relations input thus support sensory perceptual processing spatial relations important We propose new architecture maintains spatial relations input features This selection network builds LEGION Locally Excitatory Globally Inhibitory Oscillator Networks dynamics slow inhibition In input scene many objects patterns network selects largest object This system easily adjusted select several largest objects alternate time We show twostage selection network gains efficiency combining selection parallel removal noisy regions The network applied select salient object real images As special case selection network without local excitation gives rise new form oscillatory WTA
Reinforcement learning RL become central paradigm solving learningcontrol problems robotics artificial intelligence RL researchers focussed almost exclusively problems controller maximize discounted sum payoffs However emphasized Schwartz many problems eg optimal behavior limit cycle natural computationally advantageous formulate tasks controllers objective maximize average payoff received per time step In paper I derive new averagepayoff RL algorithms stochastic approximation methods solving system equations associated policy evaluation optimal control questions averagepayoff RL tasks These algorithms analogous popular TD Qlearning algorithms already developed discountedpayoff case One algorithms derived significant variation Schwartzs Rlearning algorithm Preliminary empirical results presented validate new algorithms
We present algorithms exactly learning unknown environments described deterministic finite automata The learner performs walk target automaton step observes output state chooses labeled edge traverse next state The learner means reset access teacher answers equivalence queries gives learner counterexamples hypotheses We present two algorithms The first case outputs observed learner always correct second case outputs might corrupted random noise The running times algorithms polynomial cover time underlying graph target automaton
Exploring mapping unknown environment fundamental problem studied variety contexts Many works focused finding efficient solutions restricted versions problem In paper consider model makes limited assumptions environment solve mapping problem general setting We model environment unknown directed graph G consider problem robot exploring mapping G We assume vertices G labeled thus robot hope succeeding unless given means distinguishing vertices For reason provide robot pebble device place vertex use identify vertex later In paper show If robot knows upper bound number vertices learn graph efficiently one pebble If robot know upper bound number vertices n filog log n pebbles necessary sufficient In cases algorithms deterministic
In recent years increasing interest learning Bayesian networks data One effective methods learning networks based minimum description length MDL principle Previous work shown learning procedure asymptotically successful probability one converge target distribution given sufficient number samples However rate convergence hitherto unknown In work examine sample complexity MDL based learning procedures Bayesian networks We show number samples needed learn close approximation terms entropy distance confidence ffi O log ffi log log This means sample complexity loworder polynomial error threshold sublinear confidence bound We also discuss constants term depend complexity target distribution Finally address questions asymptotic minimality propose method using sample complexity results speed learning process
Almost work Averagereward Reinforcement Learning ARL far focused tablebased methods scale domains large state spaces In paper propose two extensions modelbased ARL method called Hlearning address scaleup problem We extend Hlearning learn action models reward functions form Bayesian networks approximate value function using local linear regression We test algorithms several scheduling tasks simulated Automatic Guided Vehicle AGV show effective significantly reducing space requirement Hlearning making converge faster To best knowledge results first apply ing function approximation ARL
Understanding highdimensional real world data usually requires learning structure data space The structure may contain highdimensional clusters related complex ways Methods merge clustering selforganizing maps designed aid visualization interpretation data However methods often fail capture critical structural properties input Although selforganizing maps capture highdimensional topology represent cluster boundaries discontinuities Merge clustering extracts clusters capture local global topology This paper proposes algorithm combines topologypreserving characteristics selforganizing maps flexible adaptive structure learns cluster bound aries data
Although building sophisticated learning agents operate complex environments require learning perform multiple tasks applications reinforcement learning focussed single tasks In paper I consider class sequential decision tasks SDTs called composite sequential decision tasks formed temporally concatenating number elemental sequential decision tasks Elemental SDTs decomposed simpler SDTs I consider learning agent learn solve set elemental composite SDTs I assume structure composite tasks unknown learning agent The straightforward application reinforcement learning multiple tasks requires learning tasks separately waste computational resources memory time I present new learning algorithm modular architecture learns decomposition composite SDTs achieves transfer learning sharing solutions elemental SDTs across multiple composite SDTs The solution composite SDT constructed computationally inexpensive modifications solutions constituent elemental SDTs I provide proof one aspect learning algorithm
Existing approaches learning control robot arm rely supervised methods correct behavior explicitly given It difficult learn avoid obstacles using methods however examples obstacle avoidance behavior hard generate This paper presents alternative approach evolves neural network controllers genetic algorithms No inputoutput examples necessary since neuroevolution learns single performance measurement entire task grasping object The approach tested simulation OSCAR robot arm receives visual sensory input Neural networks evolved effectively avoid obstacles various locations reach random target locations
It widely accepted use compact representations lookup tables crucial scaling reinforcement learning RL algorithms realworld problems Unfortunately almost theory reinforcement learning assumes lookup table representations In paper address pressing issue combining function approximation RL present function approximator based simple extension state aggregation commonly used form compact representation namely soft state aggregation theory convergence RL arbitrary fixed soft state aggregation novel intuitive understanding effect state aggregation online RL new heuristic adaptive state aggregation algorithm finds improved compact representations exploiting nondiscrete nature soft state aggregation Preliminary empirical results also presented
This article introduces class incremental learning procedures specialized predictionthat using past experience incompletely known system predict future behavior Whereas conventional predictionlearning methods assign credit means difference predicted actual outcomes new methods assign credit means difference temporally successive predictions Although temporaldifference methods used Samuels checker player Hollands bucket brigade authors Adaptive Heuristic Critic remained poorly understood Here prove convergence optimality special cases relate supervisedlearning methods For realworld prediction problems temporaldifference methods require less memory less peak computation conventional methods produce accurate predictions We argue problems supervised learning currently applied really prediction problems sort temporaldifference methods applied advantage
This paper extends previous work Dyna class architectures intelligent systems based approximating dynamic programming methods Dyna architectures integrate trialanderror reinforcement learning executiontime planning single process operating alternately world learned model world In paper I present show results two Dyna architectures The DynaPI architecture based dynamic programmings policy iteration method related existing AI ideas evaluation functions universal plans reactive systems Using navigation task results shown simple DynaPI system simultaneously learns trial error learns world model plans optimal routes using evolving world model The DynaQ architecture based Watkinss Qlearning new kind reinforcement learning DynaQ uses less familiar set data structures DynaPI arguably simpler implement use We show DynaQ architectures easy adapt use changing environments
On large problems reinforcement learning systems must use parameterized function approximators neural networks order generalize similar situations actions In cases strong theoretical results accuracy convergence computational results mixed In particular Boyan Moore reported last years meeting series negative results attempting apply dynamic programming together function approximation simple control problems continuous state spaces In paper present positive results control tasks attempted one significantly larger The important differences used sparsecoarsecoded function approximators CMACs whereas used mostly global function approximators learned online whereas learned oine Boyan Moore others suggested problems encountered could solved using actual outcomes rollouts classical Monte Carlo methods TD algorithm However experiments always resulted substantially poorer performance We conclude reinforcement learning work robustly conjunction function approximators little justification present avoiding case general
We consider requirements online learninglearning must done incrementally realtime results learning available soon new example acquired Despite abundance methods learning examples used effectively online learning eg components reinforcement learning systems Most including radial basis functions CMACs Kohonens selforganizing maps developed paper share structure All expand original input representation higher dimensional representation unsupervised way map representation final answer using relatively simple supervised learner perceptron LMS rule Such structures learn rapidly reliably thought either scale poorly require extensive domain knowledge To contrary researchers Rosenblatt Gallant Smith Kanerva Prager Fallside argued expanded representation chosen largely random good results The main contribution paper develop test hypothesis We show simple randomrepresentation methods perform well nearestneighbor methods suited online learning significantly better backpropagation We find size random representation increase dimensionality problem unreasonably required size reduced substantially using unsupervisedlearning techniques Our results suggest randomness useful role play online supervised learning constructive induction
We consider problem dynamically apportioning resources among set options worstcase online framework The model study interpreted broad abstract extension wellstudied online prediction model general decisiontheoretic setting We show multiplicative weightupdate rule Littlestone Warmuth adapted model yielding bounds slightly weaker cases applicable considerably general class learning problems We show resulting learning algorithm applied variety problems including gambling multipleoutcome prediction repeated games prediction points R n
A new online learning algorithm minimizes statistical dependency among outputs derived blind separation mixed signals The dependency measured average mutual information MI outputs The source signals mixing matrix unknown except number sources The GramCharlier expansion instead Edgeworth expansion used evaluating MI The natural gradient approach used minimize MI A novel activation function proposed online learning algorithm equivariant property easily implemented neural network like model The validity new learning algorithm verified computer simulations
Overfitting wellknown problem fields symbolic connectionist machine learning It describes deterioration generalisation performance trained model In paper investigate ability novel artificial neural network bpsom avoid overfitting bpsom hybrid neural network combines multilayered feedforward network mfn Kohonens selforganising maps soms During training supervised backpropagation learning unsupervised som learning cooperate finding adequate hiddenlayer representations We show bpsom outperforms standard backpropagation also backpropagation weight decay dealing problem overfitting In addition show bpsom succeeds preserving generalisation performance hiddenunit pruning methods fail
We describe model iterated belief revision extends AGM theory revision account effect revision conditional beliefs agent In particular model ensures agent makes changes possible conditional component belief set Adopting Ramsey test minimal conditional revision provides acceptance conditions arbitrary rightnested conditionals We show problem determining acceptance nested conditional reduced acceptance tests unnested conditionals Thus iterated revision accomplished virtual manner using uniterated revision
Reinforcement learning techniques address problem learning select actions unknown dynamic environments It widely acknowledged use complex domains reinforcement learning techniques must combined generalizing function approximation methods artificial neural networks Little however understood theoretical properties combinations many researchers encountered failures practice In paper identify prime source failuresnamely systematic overestimation utility values Using Watkins QLearning example give theoretical account phenomenon deriving conditions one may expected cause learning fail Employing popular function approximators present experimental results support theoretical findings
We derive new selforganising learning algorithm maximises information transferred network nonlinear units The algorithm assume knowledge input distributions defined zeronoise limit Under conditions information maximisation extra properties found linear case Linsker The nonlinearities transfer function able pick higherorder moments input distributions perform something akin true redundancy reduction units output representation This enables network separate statistically independent components inputs higherorder generalisation Principal Components Analysis We apply network source separation cocktail party problem successfully separating unknown mixtures ten speakers We also show variant network architecture able perform blind deconvolution cancellation unknown echoes reverberation speech signal Finally derive dependencies information transfer time delays We suggest information maximisation provides unifying framework problems blind signal processing fl Please send comments tonysalkedu This paper appear Neural Computation The reference version Technical Report INC February Institute Neural Computation UCSD San Diego CA
This paper multidisciplinary review empirical statistical learning graphical model perspective Wellknown examples graphical models include Bayesian networks directed graphs representing Markov chain undirected networks representing Markov field These graphical models extended model data analysis empirical learning using notation plates Graphical operations simplifying manipulating problem provided including decomposition differentiation manipulation probability models exponential family Two standard algorithm schemas learning reviewed graphical framework Gibbs sampling expectation maximization algorithm Using operations schemas popular algorithms synthesized graphical specification This includes versions linear regression techniques feedforward networks learning Gaussian discrete Bayesian networks data The paper concludes sketching implications data analysis summarizing popular algorithms fall within framework presented
The utility problem speedup learning describes common behavior machine learning methods eventual degradation performance due increasing amounts learned knowledge The shape learning curve cost using learning method vs number training examples several domains suggests parameterized model relating performance amount learned knowledge mechanism limit amount learned knowledge optimal performance Many recent approaches avoiding utility problem speedup learning rely sophisticated utility measures significant numbers training data accurately estimate utility control knowledge Empirical results presented elsewhere indicate simple selection strategy retaining control rules derived training problem explanation quickly defines efficient set control knowledge training problems This simple selection strategy provides lowcost alternative exampleintensive approaches improving speed problem solver Experimentation illustrates existence minimum representing least cost learning curve reached training examples Stress placed controlling amount learned knowledge opposed knowledge An attempt also made relate domain characteristics shape learning curve
We compare kernel estimators single multilayered perceptrons radialbasis functions problems classification handwritten digits speech phonemes By taking two different applications employing many techniques report twodimensional study whereby domainindependent assessment learning methods possible We consider feedforward network one hidden layer As examples local methods use kernel estimators like knearest neighbor knn Parzen windows generalized knn Grow Learn Condensed Nearest Neighbor We also considered fuzzy knn due similarity As distributed networks use linear perceptron pairwise separating linear perceptron multilayer perceptrons sigmoidal hidden units We also tested radialbasis function network combination local distributed networks Four criteria taken comparison Correct classification test set network size learning time operational complexity We found perceptrons architecture suitable generalize better local memorybased kernel estimators require longer training precise computation Local networks simple learn quickly acceptably use memory
In current CBR systems case adaptation usually performed rulebased methods use taskspecific rules handcoded system developer The ability define rules depends knowledge task domain may available priori presenting serious impediment endowing CBR systems needed adaptation knowledge This paper describes ongoing research method address problem acquiring adaptation knowledge experience The method uses reasoning scratch based introspective reasoning requirements successful adaptation build library adaptation cases stored future reuse We describe tenets approach types knowledge requires We sketch initial computer implementation lessons learned open questions study
This position paper sketches framework modeling introspective reasoning discusses relevance framework modeling introspective reasoning memory search It argues effective flexible memory processing rich memories built five types explicitly represented selfknowledge knowledge information needs relationships different types information expectations actual behavior information search process desires ideal behavior representations expectations desires relate actual performance This approach modeling memory search illustration general principles modeling introspective reasoning step towards addressing problem reasoner human machinecan acquire knowledge properties knowledge base
Machine learning techniques perceived great potential means acquisition knowledge nevertheless use complex engineering domains still rare Most machine learning techniques studied context knowledge acquisition well defined tasks classification Learning tasks handled relatively simple algorithms Complex domains present difficulties approached combining strengths several complementing learning techniques overcoming weaknesses providing alternative learning strategies This study presents two perspectives macro micro viewing issue multistrategy learning The macro perspective deals decomposition overall complex learning task relatively welldefined learning tasks micro perspective deals designing multistrategy learning techniques supporting acquisition knowledge task The two perspectives discussed context
In order learn effectively reasoner must possess knowledge world able improve knowledge also must introspectively reason performs given task particular pieces knowledge needs improve performance current task Introspection requires declarative representations metaknowledge reasoning performed system performance task systems knowledge organization knowledge This chapter presents taxonomy possible reasoning failures occur performance task declarative representations failures associations failures particular learning strategies The theory based MetaXPs explanation structures help system identify failure types formulate learning goals choose appropriate learning strategies order avoid similar mistakes future The theory implemented computer model introspective reasoner performs multistrategy learning story understanding task
We introduce learning algorithm unsupervised neural networks based ideas statistical mechanics The algorithm derived mean field approximation large layered sigmoid belief networks We show approximately infer statistics networks without resort sampling This done solving mean field equations relate statistics unit Markov blanket Using statistics target values weights network adapted local delta rule We evaluate strengths weaknesses networks problems statistical pattern recognition
The paper describes selflearning control system mobile robot Based sensor information control system provide steering signal way collisions avoided Since case examples available system learns basis external reinforcement signal negative case collision zero otherwise We describe adaptive algorithm used discrete coding state space adaptive algorithm learning correct mapping input state vector output steering signal
fl Partially supported Advanced Research Projects Agency AFOSR Partially supported Air Force Office Scientific Research AFOSR FJ Advanced Research Projects Agency ONR NJ Office Naval Research ONR NJ z Partially funded Air Force Office Scientific Research AFOSR FJ Office Naval Research ONR NJ ONR N
The problem approximating smooth L p functions spaces spanned integer translates radially symmetric function well understood In case points translation ffi scattered throughout R approximation problem well understood stationary setting In work treat nonstationary setting assumption ffi small perturbation Z Our results similar many respects known results case ffi Z apply specifically examples Gauss kernel Generalized Multiquadric
In paper initiate investigation generalizations Probably Approximately Correct PAC learning model attempt significantly weaken target function assumptions The ultimate goal direction informally termed agnostic learning make virtually assumptions target function The name derives fact designers learning algorithms give belief Nature represented target function simple succinct explanation We give number positive negative results provide initial outline possibilities agnostic learning Our results include hardness results obvious generalization PAC model agnostic setting efficient general agnostic learning method based dynamic programming relationships loss functions agnostic learning algorithm learning problem involves hidden variables
In paper describe design implementation derivation replay framework dersnlpebl Derivational snlpebl based within partial order planner dersnlpebl replays previous plan derivations first repeating earlier decisions context new problem situation extending replayed path obtain complete solution new problem When replayed path extended new solution explanationbased learning ebl techniques employed identify features new problem prevent extension These features added censors retrieval stored case To keep retrieval costs low dersnlpebl normally stores plan derivations individual goals replays one derivations solving multigoal problems Cases covering multiple goals stored subplans individual goals successfully merged The aim constructing case library predict goal interactions store multigoal case set negatively interacting goals We provide empirical results demonstrating effectiveness dersnlpebl improving planning performance randomlygenerated problems drawn complex domain
Previous algorithms supervised sequence learning based dynamic recurrent networks This paper describes alternative class gradientbased systems consisting two feedforward nets learn deal temporal sequences using fast weights The first net learns produce context dependent weight changes second net whose weights may vary quickly The method offers potential STM storage efficiency A single weight instead fullfledged unit may sufficient storing temporal information Various learning methods derived Two experiments unknown time delays illustrate approach One experiment shows system used adaptive temporary variable binding
Evolutionary tree reconstruction important step many biological research problems yet extremely difficult variety computational statistical scientific reasons In particular reconstruction large trees containing significant amounts divergence especially challenging We present paper new tree reconstruction method call DiskCovering Method used recover accurate estimations evolutionary tree otherwise intractable datasets DCM obtains decomposition input dataset small overlapping sets closely related taxa reconstructs trees subsets using base phylogenetic method choice combines subtrees one tree entire set taxa Because subproblems analyzed DCM smaller computationally expensive methods maximum likelihood estimation used without incurring much cost At time taxa within subset closely related even simple methods neighborjoining much likely highly accurate The result DCMboosted methods typically faster accurate compared naive use method In paper describe basic ideas techniques DCM demonstrate advantages DCM experimentally simulating sequence evolution variety trees
Automating construction semantic grammars difficult interesting problem machine learning This paper shows semanticgrammar acquisition problem viewed learning searchcontrol heuristics logic program Appropriate control rules learned using new firstorder induction algorithm automatically invents useful syntactic semantic categories Empirical results show learned parsers generalize well novel sentences outperform previous approaches based connectionist techniques
Conventional speculative architectures use branch prediction evaluate likely execution path program execution However certain branches difficult predict One solution problem evaluate paths following conditional branch Predicated execution used implement form multipath execution Predicated architectures fetch issue instructions associated predicates These predicates indicate instruction commit result Predicating branch reduces number branches executed eliminating chance branch misprediction cost executing additional instructions In paper propose restricted form multipath execution called Dynamic Predication architectures little support predicated instructions instruction set Dynamic predication dynamically predicates instruction sequences form branch hammock concurrently executing paths branch A branch hammock short forward branch spans instructions form ifthen ifthenelse construct We mark constructs executable When decode stage detects sequence passes predicated instruction sequence dynamically scheduled execution core Our results show dynamic predication accrue speedups
In barn owl selforganization auditory map space external nucleus inferior colliculus ICx strongly influenced vision nature interaction unknown In paper biologically plausible minimalistic model ICx selforganization proposed ICx receives learn signal based owls visual attention When visual attention focused spatial location auditory input learn signal turned map allowed adapt A twodimensional Kohonen map used model ICx simulations performed evaluate learn signal would affect auditory map When primary area visual attention shifted different spatial locations auditory map shifted corresponding location The shift complete done early development partial done later Similar results observed barn owl visual field modified prisms Therefore simulations suggest learn signal based visual attention possible explanation auditory plasticity
The place fields hippocampal cells old animals sometimes change animal removed returned environment Barnes et al The ensemble correlation two sequential visits environment shows strong bimodality old animals near indicative remapping greater indicative similar representation experiences strong unimodality young animals greater indicative similar representation experiences One explanation multimap hypothesis multiple maps encoded hippocampus old animals may sometimes returning wrong map A theory proposed Samsonovich McNaughton suggests Barnes et al experiment implies maps prewired CA region hippocampus Here offer alternative explanation orthogonalization properties dentate gyrus DG region hippocampus interact errors selflocalization reset path integrator reentry environment produce bimodality
MIT Media Laboratory Perceptual Computing Section Technical Report No Appeared th IEEE Intl Conference Pattern Recognition ICPR Vienna Austria Abstract We present foveated gesture recognition system guides active camera foveate salient features based reinforcement learning paradigm Using vision routines previously implemented interactive environment determine spatial location salient body parts user guide active camera obtain images gestures expressions A hiddenstate reinforcement learning paradigm based Partially Observable Markov Decision Process POMDP used implement visual attention The attention module selects targets foveate based goal successful recognition uses new multiplemodel Qlearning formulation Given set target distractor gestures system learn foveate maximally discriminate particular gesture
Various extensions Genetic Algorithm GA attempt find optima search space containing several optima Many emulate natural speciation For coevolutionary learning succeed range management control problems learning game strategies methods must find optima However suitable comparison studies rare We compare two similar GA speciation methods fitness sharing implicit sharing Using realistic letter classification problem find advantages different circumstances Implicit sharing covers optima comprehensively population large enough species form optimum With population large enough fitness sharing find optima larger basins attraction ignore peaks narrow bases implicit sharing easily distracted This indicates speciated GA trying find many nearglobal optima possible implicit sharing works well population large enough This requires prior knowledge many peaks exist
Modern knowledge systems design typically employ multiple problemsolving methods turn use different kinds knowledge The construction heterogeneous knowledge system support practical design thus raises two fundamental questions accumulate huge volumes design information support heterogeneous design processing Fortunately partial answers questions exist separately Legacy databases already contain huge amounts generalpurpose design information In addition modern knowledge systems typically characterize kinds knowledge needed specific problemsolving methods quite precisely This leads us hypothesize methodspecific datatoknowledge compilation potential mechanism integrating heterogeneous knowledge systems legacy databases design In paper first outline general computational architecture called HIPED integration Then focus specific issue convert data accessed legacy database form appropriate problemsolving method used heterogeneous knowledge system We describe experiment legacy knowledge system called Interactive Kritik integrated ORACLE database using IDI communication tool The limited experiment indicates computational feasibility methodspecific datatoknowledge compilation also raises additional research issues
This paper investigates advantages disadvantages mixture experts ME model introduced connectionist community JJNH applied time series analysis WM two time series dynamics well understood The first series computergenerated series consisting mixture noisefree process quadratic map noisy process composition noisy linear autoregressive hyperbolic tangent There three main results ME model produces significantly better results single networks discovers regimes correctly also allows us characterize subprocesses variances due correct matching noise level model data avoids overfitting The second series laser series used Santa Fe competition ME model also obtains excellent outofsample predictions allows analysis shows overfitting
In natural visual experience different views object tend appear close temporal proximity animal manipulates object navigates around We investigated ability attractor network acquire view invariant visual representations associating first neighbors pattern sequence The pattern sequence contains successive views faces ten individuals change pose Under network dynamics developed Griniasty Tsodyks Amit multiple views given subject fall basin attraction We use independent component ICA representation faces input patterns Bell Sejnowski The ICA representation advantages principal component representation PCA viewpointinvariant recognition without attractor network suggesting ICA better representation PCA object recognition
This paper examines effects relaxed synchronization numerical parallel efficiency parallel genetic algorithms GAs We describe coarsegrain geographically structured parallel genetic algorithm Our experiments provide preliminary evidence asynchronous versions algorithms lower run time synchronous GAs Our analysis shows improvement due decreased synchronization costs high numerical efficiency eg fewer function evaluations asynchronous GAs This analysis includes critique utility traditional parallel performance measures parallel GAs
Key ideas statistical learning theory support vector machines generalized decision trees A support vector machine used decision tree The optimal decision tree characterized primal dual space formulation constructing tree proposed The result method generating logically simple decision trees multivariate linear nonlinear decisions The preliminary results indicate method produces simple trees generalize well respect decision tree algorithms single support vector machines
We previously shown regularization principles lead approximation schemes equivalent networks one layer hidden units called Regularization Networks In particular standard smoothness functionals lead subclass regularization networks well known Radial Basis Functions approximation schemes This paper shows regularization networks encompass much broader range approximation schemes including many popular general additive models neural networks In particular introduce new classes smoothness functionals lead different classes basis functions Additive splines well tensor product splines obtained appropriate classes smoothness functionals Furthermore generalization extends Radial Basis Functions RBF Hyper Basis Functions HBF also leads additive models ridge approximation models containing special cases Breimans hinge functions forms Projection Pursuit Regression several types neural networks We propose use term Generalized Regularization Networks broad class approximation schemes follow extension regularization In probabilistic interpretation regularization different classes basis functions correspond different classes prior probabilities approximating function spaces therefore different types smoothness assumptions In summary different multilayer networks one hidden layer collectively call Generalized Regularization Networks correspond different classes priors associated smoothness functionals classical regularization principle Three broad classes Radial Basis Functions generalized Hyper Basis Functions b tensor product splines c additive splines generalized schemes type ridge approximation hinge functions several perceptronlike neural networks onehidden layer This paper appear Neural Computation vol pages An earlier version
Learning inputoutput mapping set examples type many neural networks constructed perform regarded synthesizing approximation multidimensional function solving problem hypersurface reconstruction From point view form learning closely related classical approximation techniques generalized splines regularization theory This paper considers problems exact representation detail approximation linear nonlinear mappings terms simpler functions fewer variables Kolmogorovs theorem concerning representation functions several variables terms functions one variable turns almost irrelevant context networks learning We develop theoretical framework approximation based regularization techniques leads class threelayer networks call Generalized Radial Basis Functions GRBF since mathematically related wellknown Radial Basis Functions mainly used strict interpolation tasks GRBF networks equivalent generalized splines also closely related pattern recognition methods Parzen windows potential functions several neural network algorithms Kanervas associative memory backpropagation Kohonens topology preserving map They also interesting interpretation terms prototypes synthesized optimally combined learning stage The paper introduces several extensions applications technique discusses intriguing analogies neurobiological data c fl Massachusetts Institute Technology This paper describes research done within Center Biological Information Processing Department Brain Cognitive Sciences Artificial Intelligence Laboratory This research sponsored grant Office Naval Research ONR Cognitive Neural Sciences Division Artificial Intelligence Center Hughes Aircraft Corporation Alfred P Sloan Foundation National Science Foundation Support A I Laboratorys artificial intelligence research provided Advanced Research Projects Agency Department Defense Army contract DACAC part ONR contract NK
This article describes reasoner improve understanding incompletely understood domain application already knows novel problems domain Casebased reasoning process using past experiences stored reasoners memory understand novel situations solve novel problems However process assumes past experiences well understood provide good lessons used future situations This assumption usually false one learning novel domain since situations encountered previously domain might understood completely Furthermore reasoner may even case adequately deals new situation may able access case using existing indices We present theory incremental learning based revision previously existing case knowledge response experiences situations The theory implemented casebased story understanding program learn new case situations case already exists b learn index case memory c incrementally refine understanding case using reason new situations thus evolving better understanding domain experience This research complements work casebased reasoning providing mechanisms case library automatically built use casebased reasoning program
We present statistical model genes DNA A Generalized Hidden Markov Model GHMM provides framework describing grammar legal parse DNA sequence Stormo Haussler Probabilities assigned transitions states GHMM generation nucleotide base given particular state Machine learning techniques applied optimize probabilities using standardized training set Given new candidate sequence best parse deduced model using dynamic programming algorithm identify path model maximum probability The GHMM flexible modular new sensors additional states inserted easily In addition provides simple solutions integrating cardinality constraints reading frame constraints indels homology searching The description results implementation genefinding model called Genie presented The exon sensor codon frequency model conditioned windowed nucleotide frequency preceding codon Two neural networks used Brunak Engelbrecht Knudsen splice site prediction We show simple model performs quite well For crossvalidated standard test set genes ftpwwwhgclblgovpubgenesets human DNA genefinding system identified proteincoding bases correctly specificity exons exactly identified specificity Genie shown perform favorably compared several genefinding systems
We examine questions optimality domination repeated stage games one players may draw strategies perhaps different computationally bounded sets We also consider optimality domination bounded convergence rates infinite payoff We develop notion grace period handle problem vengeful strategies
We study problem efficiently learning play game optimally unknown adversary chosen computationally bounded class We contribute line research playing games finite automata expand scope research considering new classes adversaries We introduce natural notions games recent history adversaries whose current action determined simple boolean formula recent history play games statistical adversaries whose current action determined simple function statistics entire history play In cases give efficient algorithms learning play pennymatching difficult game called contract We also give powerful positive result date learning play finite automata efficient algorithm learning play game finite automata probabilistic actions low cover time
MORGAN integrated system finding genes vertebrate DNA sequences MORGAN uses variety techniques accomplish task distinctive decision tree classifier The decision tree system combined new methods identifying start codons donor sites acceptor sites brought together framesensitive dynamic programming algorithm finds optimal segmentation DNA sequence coding noncoding regions exons introns The optimal segmentation dependent separate scoring function takes subsequence assigns score reflecting probability sequence exon The scoring functions MORGAN sets decision trees combined give probability estimate Experimental results database vertebrate DNA sequences show MORGAN excellent performance many different measures On separate test set achieves overall accuracy correlation coefficient sensitivity specificity coding bases In addition MORGAN identifies coding exons exactly ie beginning end coding regions predicted correctly This paper describes MORGAN system including decision tree routines algorithms site recognition performance benchmark database vertebrate DNA
In paper report use backpropagation based neural networks implement phase computational intelligence process PYTHIA expert system supporting numerical simulation applications modelled partial differential equations PDEs PYTHIA exemplar based reasoning system provides advice method parameters use simulation specified PDE based application When advice requested characteristics given model matched characteristics previously seen classes models The performance various solution methods previously seen similar classes models used basis predicting method use Thus major step reasoning process PYTHIA involves analysis categorization models classes models based characteristics In study demonstrate use neural networks identify class predefined models whose characteristics match ones specified PDE based application
A method described reduces hypotheses space efficient easily interpretable reduction criteria called reduction A learning algorithm described based reduction analyzed using probability approximate correct learning results The results obtained reducing rule set equivalent set kDNF formulas The goal learning algorithm induce compact rule set describing basic dependencies within set data The reduction based criterion exible gives semantic interpretation rules fulfill criteria Comparison syntactical hypotheses reduction show reduction improves search smaller probability missclassification
We identify three principle factors affecting performance learning networks localized units unit noise sample density structure target function We analyze effect unit receptive field parameters factors use analysis propose new learning algorithm dynamically alters receptive field properties learning
SemiMarkov Decision Problems continuous time generalizations discrete time Markov Decision Problems A number reinforcement learning algorithms developed recently solution Markov Decision Problems based ideas asynchronous dynamic programming stochastic approximation Among TD Qlearning Realtime Dynamic Programming After reviewing semiMarkov Decision Problems Bellmans optimality equation context propose algorithms similar named adapted solution semiMarkov Decision Problems We demonstrate algorithms applying problem determining optimal control simple queueing system We conclude discussion circumstances algorithms may usefully ap plied
There recently widespread interest use multiple models classification regression statistics neural networks communities The Hierarchical Mixture Experts HME successful number regression problems yielding significantly faster training use Expectation Maximisation algorithm In paper extend HME classification results reported three common classification benchmark tests ExclusiveOr Ninput Parity Two Spirals
One important factor determining computa tional complexity evaluating probabilistic network cardinality state spaces nodes By varying granularity state spaces one trade accuracy result computational efficiency We present time procedure approximate evaluation probabilistic networks based idea On application simple networks proce dure exhibits smooth improvement approxi mation quality computation time increases This suggests statespace abstraction one useful control parameter designing real time probabilistic reasoners
Existing complexity measures contemporary learning theory conveniently applied specific learning problems eg training sets Moreover typically nongeneric ie necessitate making assumptions way learner operate The lack satisfactory generic complexity measure learning problems poses difficulties researchers various areas present paper puts forward idea may help alleviate It shows supervised learning problems fall two generic complexity classes one associated computational tractability By determining class particular problem belongs thus effectively evaluate degree generic difficulty
The paper investigates statistical effects may need exploited supervised learning It notes effects classified according conditionality order proposes learning algorithms typically form bias towards particular classes effect It presents results empirical study statistical bias backpropagation The study involved applying algorithm wide range learning problems using variety different internal architectures The results study revealed backpropagation specific bias general direction statistical rather relational effects The paper shows existence bias effectively constitutes weakness algorithms ability discount noise
segmentation Preliminary results Abstract Scatterpartitioning Radial Basis Function RBF networks increase number degrees freedom complexity inputoutput mapping estimated basis supervised training data set Due superior expressive power scatterpartitioning Gaussian RBF GRBF model termed Supervised Growing Neural Gas SGNG selected literature SGNG employs onestage errordriven learning strategy capable generating removing hidden units synaptic connections A slightly modified SGNG version tested function estimator training surface fitted image ie D signal whose size finite The relationship generation learning system disjointed maps hidden units presence image pictorially homogeneous subsets segments investigated Unfortunately examined SGNG version performs poorly function estimator image segmenter This may due intrinsic inadequacy onestage errordriven learning strategy adjust structural parameters output weights simultaneously consistently In framework RBF networks studies investigate combination twostage errordriven learning strategies synapse generation removal criteria Internal report paper entitled Image segmentation scatterpartitioning RBF networks A feasibility study presented conference Applications Science Neural Networks Fuzzy Systems Evolutionary Computation part SPIEs International Symposium Optical Science Engineering Instrumentation July San Diego CA
A distinct advantage symbolic learning algorithms artificial neural networks typically concept representations form easily understood humans One approach understanding representations formed neural networks extract symbolic rules trained networks In paper describe investigate approach extracting rules networks uses NofM extraction algorithm network training method soft weightsharing Previously NofM algorithm successfully applied knowledgebased neural networks Our experiments demonstrate extracted rules generalize better rules learned using C system In addition accurate extracted rules also reasonably comprehensible
Our experience showed us exibility expressing parallel algorithm simulating neural networks desirable even possible obtain efficient solution single training algorithm We believe advantages clear easy understand program predominates disadvantages approaches allowing specific machine neural network algorithm We currently investigate neural network models worth parallelized resulting parallel algorithms composed common basic building blocks logarithmic tree efficient communication structure connections connections D Ackley G Hinton T Sejnowski A Learning Algorithm Boltzmann Machines Cognitive Science pp B M Forrest et al Implementing Neural Network Models Parallel Computers The computer Journal vol W Giloi Latency Hiding Message Passing Architectures International Parallel Processing Symposium April Cancun Mexico IEEE Computer Society Press T Nordstrm B Svensson Using And Designing Massively Parallel Computers Artificial Neural Networks Journal Of Parallel And Distributed Computing vol pp A Kramer A Vincentelli Efficient parallel learning algorithms neural networks Advances Neural Information Processing Systems I D Touretzky ed pp T Kohonen SelfOrganization Associative Memory SpringerVerlag Berlin D A Pomerleau G L Gusciora D L Touretzky H T Kung Neural Network Simulation Warp Speed How We Got Million Connections Per Second IEEE Intern Conf Neural Networks July A Rbel Dynamic selection training patterns neural networks A new method control generalization Technical Report Technical University Berlin D E Rumelhart D E Hinton R J Williams Learning internal representations error propagation Rumelhart McClelland eds Parallel Distributed Processing Explorations Microstructure Cognition vol I pp Bradford BooksMIT Press Cambridge MA W Schiffmann M Joost R Werner Comparison optimized backpropagation algorithms Proc European Symposium Artificial Neural Networks ESANN Brussels pp J Schmidhuber Accelerated Learning BackPropagation Nets Connectionism perspective Elsevier Science Publishers BV NorthHolland pp M Taylor P Lisboa eds Techniques Applications Neural Networks Ellis Horwood M Witbrock M Zagha An implementation backpropagation learning GF large SIMD parallel computer Parallel Computing vol pp X Zhang M Mckenna J P Mesirov D L Waltz The backpropagation algorithm grid hypercube architectures Parallel Computing vol pp
In order learn effectively system must possess knowledge world able improve knowledge also must introspectively reason performs given task particular pieces knowledge needs improve performance current task Introspection requires declaratflive representation reasoning performed system performance task This paper presents taxonomy possible reasoning failures occur task declarative representations associations particular learning strategies We propose theory MetaXPs explanation structures help system identify failure types choose appropriate learning strategies order avoid similar mistakes future A program called MetaAQUA embodies theory processes examples domain drug smuggling
Although artificial neural networks applied variety realworld scenarios remarkable success often criticized exhibiting low degree human comprehensibility Techniques compile compact sets symbolic rules artificial neural networks offer promising perspective overcome obvious deficiency neural network representations This paper presents approach extraction ifthen rules artificial neural networks Its key mechanism validity interval analysis generic tool extracting symbolic knowledge propagating rulelike knowledge Backpropagationstyle neural networks Empirical studies robot arm domain illustrate appropriateness proposed method extracting rules networks realvalued distributed representations
In paper examine method feature subset selection based Information Theory Initially framework defining theoretically optimal computationally intractable method feature subset selection presented We show goal eliminate feature gives us little additional information beyond subsumed remaining features In particular case irrelevant redundant features We give efficient algorithm feature selection computes approximation optimal feature selection criterion The conditions approximate algorithm successful examined Empirical results given number data sets showing algorithm effectively han dles datasets large numbers features
In paper address problem casebased learning presence irrelevant features We review previous work attribute selection present new algorithm Oblivion carries greedy pruning oblivious decision trees effectively store set abstract cases memory We hypothesize approach efficiently identify relevant features even interact parity concepts We report experimental results artificial domains support hypothesis experiments natural domains show improvement cases others In closing discuss implications experiments consider additional work irrelevant features outline directions future research
Learning plays vital role development situated agents In paper explore use reinforcement learning shape robot perform predefined target behavior We connect simulated real robots A LECSYS parallel implementation learning classifier system extended genetic algorithm After classifying different kinds Animatlike behaviors explore effects learning different types agents architecture monolithic flat hierarchical training strategies In particular hierarchical architecture requires agent learn coordinate basic learned responses We show best results achieved agents architecture training strategy match structure behavior pattern learned We report results number experiments carried simulated real environments show results simulations carry smoothly real robots While experiments deal simple reactive behavior one demonstrate use simple general memory mechanism As whole experimental activity demonstrates classifier systems genetic algorithms practically employed develop autonomous agents
Although probabilistic inference general Bayesian belief network NPhard problem inference computation time reduced practical cases exploiting domain knowledge making appropriate approximations knowledge representation In paper introduce property similarity states new method approximate knowledge representation based property We define two states node similar likelihood ratio probabilities depend instantiations nodes network We show similarity states exposes redundancies joint probability distribution exploited reduce computational complexity probabilistic inference networks multiple similar states For example show BNO networka two layer networks often used diagnostic problemscan reduced close network multiple similar states Probabilistic inference new network done polynomial time respect size network results queries practical importance close results obtained exponential time original network The error introduced reduction converges zero faster exponentially respect degree polynomial describing resulting computational complexity
Multilayer architectures used Bayesian belief networks Helmholtz machines provide powerful framework representing learning higher order statistical relations among inputs Because exact probability calculations models often intractable much interest finding approximate algorithms We present algorithm efficiently discovers higher order structure using EM Gibbs sampling The model interpreted stochastic recurrent network ambiguity lowerlevel states resolved feedback higher levels We demonstrate performance algorithm bench mark problems
In paper study extension distributionfree model learning introduced Valiant also known probably approximately correct PAC model allows presence malicious errors examples given learning algorithm Such errors generated adversary unbounded computational power access entire history learning algorithms computation Thus study worstcase model errors Our results include general methods bounding rate error tolerable learning algorithm efficient algorithms tolerating nontrivial rates malicious errors equivalences problems learning errors standard combinatorial optimization problems
We investigate problem computing posterior probability model class given data sample prior distribution possible parameter settings By model class mean group models share parametric form In general posterior may hard compute highdimensional parameter spaces usually case realworld applications In literature several methods computing posterior approximately proposed quality approximations may depend heavily size available data sample In work interested testing well approximative methods perform realworld problem domains In order conduct study chosen model family finite mixture distributions With certain assumptions able derive model class posterior analytically model family We report series model class selection experiments realworld data sets true posterior approximations compared The empirical results support hypothesis approximative techniques provide good estimates true posterior especially sample size grows large
Email FirstnameLastnamecsHelsinkiFI Report C University Helsinki Department Computer Science Abstract In paper explore use finite mixture models building decision support systems capable sound probabilistic inference Finite mixture models many appealing properties computationally efficient prediction reasoning phase universal sense approximate problem domain distribution handle multimodality well We present formulation model construction problem Bayesian framework finite mixture models describe Bayesian inference performed given model The model construction problem seen missing data estimation describe realization ExpectationMaximization EM algorithm finding good models To prove feasibility approach report crossvalidated empirical results several publicly available classification problem datasets compare results corresponding results obtained alternative techniques neural networks decision trees The comparison based best results reported literature datasets question It appears using theoretically sound Bayesian framework suggested reported results outperformed relatively small effort
One application models reasoning behavior allow reasoner introspectively detect repair failures reasoning process We address issues transferability models versus specificity knowledge kinds knowledge needed selfmodeling knowledge structured evaluation introspective reasoning systems We present ROBBIE system implements model planning processes improve planner response reasoning failures We show ROBBIEs hierarchical model balances model generality access implementationspecific details discuss qualitative quantitative measures used evaluating introspective component
We consider multicriteria sequential decision making problems criteria ordered according importance Structural properties problems touched reinforcement learning algorithms learn asymptotically optimal decisions derived Computer experiments confirm theoretical results provide insight learning processes
Acyclic digraphs ADGs widely used describe dependences among variables multivariate distributions In particular likelihood functions ADG models admit convenient recursive factorizations often allow explicit maximum likelihood estimates well suited building Bayesian networks expert systems There may however many ADGs determine dependence Markov model Thus family ADGs given set vertices naturally partitioned Markovequivalence classes class associated unique statistical model Statistical procedures model selection model averaging fail take account equivalence classes may incur substantial computational inefficiencies Recent results shown Markovequivalence class uniquely determined single chain graph essential graph Markovequivalent simultaneously ADGs equivalence class Here propose two stochastic Bayesian model averaging selection algorithms essential graphs apply analysis three discretevariable data sets
Given set samples unknown probability distribution study problem constructing good approximative Bayesian network model probability distribution question This task viewed search problem goal find maximal probability network model given data In work make attempt learn arbitrarily complex multiconnected Bayesian network structures since resulting models unsuitable practical purposes due exponential amount time required reasoning task Instead restrict special class simple treestructured Bayesian networks called Bayesian prototype trees polynomial time algorithm Bayesian reasoning exists We show probability given Bayesian prototype tree model evaluated given data evaluation criterion used stochastic simulated annealing algorithm searching model space The simulated annealing algorithm provably finds maximal probability model provided sufficient amount time used
We use simple illustrative example expose main ideas Evidential Probability Specifically show use acceptance rule naturally leads use intervals represent probabilities change opinion due experience facilitated probabilities concerning compound experiments events computed given proper knowledge underlying distributions
This paper presents UTree reinforcement learning algorithm uses selective attention shortterm memory simultaneously address intertwined problems large perceptual state spaces hidden state By combining advantages work instancebased memorybased learning work robust statistical tests separating noise task structure method learns quickly creates taskrelevant state distinctions handles noise well UTree uses treestructured representation related work Prediction Suffix Trees Ron et al Partigame Moore Galgorithm Chapman Kaelbling Variable Resolution Dynamic Programming Moore It builds Utile Suffix Memory McCallum c used shortterm memory selective perception The algorithm demonstrated solving highway driving task agent weaves around slower faster traffic The agent uses active perception simulated eye movements The environment hidden state time pressure stochasticity world states percepts From environment sensory system agent uses utile distinction test build tree represents depththree memory necessary internal statesfar fewer states would resulted fixedsized historywindow ap proach
Feature selection problem choosing subset relevant features In general exhaustive search bring optimal subset With monotonic measure exhaustive search avoided without sacrificing optimality Unfortunately error distancebased measures monotonic A new measure employed work monotonic fast compute The search relevant features according measure guaranteed complete exhaustive Experiments conducted verification
This paper describes novel method dialogue agent learn choose optimal dialogue strategy While widely agreed dialogue strategies formulated terms communicative intentions little work automatically optimizing agents choices multiple ways realize communicative intention Our method based combination learning algorithms empirical evaluation techniques The learning component method based algorithms reinforcement learning dynamic programming Qlearning The empirical component uses PARADISE evaluation framework Walker et al identify important performance factors provide performance function needed learning algorithm We illustrate method dialogue agent named ELVIS EmaiL Voice Interactive System supports access email phone We show ELVIS learn choose among alternate strategies agent initiative reading messages summarizing email folders
This paper outlines problems may occur Reduced Error Pruning Inductive Logic Programming notably efficiency Thereafter new method Incremental Reduced Error Pruning proposed attempts address problems Experiments show many noisy domains method much efficient alternative algorithms along slight gain accuracy However experiments show well use algorithm recommended domains specific concept description
EEG analysis played key role modeling brains cortical dynamics relatively little effort devoted developing EEG limited means communication If several mental states reliably distinguished recognizing patterns EEG paralyzed person could communicate device like wheelchair composing sequences mental states EEG pattern recognition difficult problem hinges success finding representations EEG signals patterns distinguished In article report study comparing three EEG representations unprocessed signals reduceddimensional representation using KarhunenLoeve transform frequencybased representation Classification performed twolayer neural network implemented CNAPS server processor SIMD architecture Adaptive Solutions Inc Execution time comparisons show hundredfold speed Sun Sparc The best classification accuracy untrained samples using frequencybased representation
This paper surveys field reinforcement learning computerscience perspective It written accessible researchers familiar machine learning Both historical basis field broad selection current work summarized Reinforcement learning problem faced agent learns behavior trialanderror interactions dynamic environment The work described resemblance work psychology differs considerably details use word reinforcement The paper discusses central issues reinforcement learning including trading exploration exploitation establishing foundations field via Markov decision theory learning delayed reinforcement constructing empirical models accelerate learning making use generalization hierarchy coping hidden state It concludes survey implemented systems assessment practical utility current methods reinforcement learning
We add internal memory XCS classifier system We test XCS internal memory named XCSM nonMarkovian environments two four aliasing states Experimental results show XCSM easily converge optimal solutions simple environments moreover XCSMs performance stable respect size internal memory involved learning However results present evidence complex nonMarkovian environments XCSM may fail evolve optimal solution Our results suggest happens exploration strategies currently employed XCS adequate guarantee convergence optimal policy XCSM complex nonMarkovian environments
Simple modification standard hill climbing optimization algorithm taking account learning features discussed Basic concept approach socalled probability vector single entries determine probabilities appearance entries nbit vectors This vector used random generation nbit vectors form neighborhood specified given probability vector Within neighborhood best solutions smallest functional values minimized function recorded The feature learning introduced probability vector updated formal analogue Hebbian learning rule wellknown theory artificial neural networks The process repeated probability vector entries close either zero one The resulting probability vector unambiguously determines nbit vector may interpreted optimal solution given optimization task Resemblance genetic algorithms discussed Effectiveness proposed method illustrated example looking global minima highly multimodal function
This paper describes efficient methods exact approximate implementation MINFEATURES bias prefers consistent hypotheses definable features possible This bias useful learning domains many irrelevant features present training data We first introduce FOCUS new algorithm exactly implements MINFEATURES bias This algorithm empirically shown substantially faster FOCUS algorithm previously given Almuallim Dietterich We introduce MutualInformationGreedy SimpleGreedy WeightedGreedy algorithms apply efficient heuristics approximating MINFEATURES bias These algorithms employ greedy heuristics trade optimality computational efficiency Experimental studies show learning performance ID greatly improved algorithms used preprocess training data eliminating irrelevant features IDs consideration In particular WeightedGreedy algorithm provides excellent efficient approximation MIN
A statistical approach decision tree modeling described In approach decision tree modeled parametrically process output generated input sequence decisions The resulting model yields likelihood measure goodness fit allowing ML MAP estimation techniques utilized An efficient algorithm presented estimate parameters tree The model selection problem presented several alternative proposals considered A hidden Markov version tree described data sequences temporal dependencies
A binary matrix Our task infer given z A given assumptions statistical properties n This problem arises decoding noisy communication z transmitted using errorcorrecting code based parity checks original signal inference sequence linear feedback shift register LFSR noisy observation sequence P zjA I assume decoders aim find probable For large N exhaustive search N possible sequences feasible One way attack combinatorial problem create related continuous optimization problem discrete variables replaced real variables Here I derive continuous representation terms free energy approximation awkward posterior distribution
An intelligent system capable adapting constantly changing environment It therefore ought capable learning perceptual interactions surroundings This requires certain amount plasticity structure Any attempt model perceptual capabilities living system matter construct synthetic system comparable abilities must therefore account plasticity variety developmental learning mechanisms This paper examines results neuroanatomical morphological well behavioral studies development visual perception integrates computational framework suggests several interesting experiments computational models yield insights development visual perception In order understand development information processing structures brain one needs knowledge changes undergoes birth maturity context normal environment However knowledge development aberrant settings also extremely useful reveals extent development function environmental experience opposed genetically determined prewiring Accordingly consider development visual system normal restricted rearing conditions The role experience early development sensory systems general visual system particular widely studied variety experiments involving carefully controlled manipulation environment presented animal Extensive reviews results found Mitchell Movshon Hirsch Boothe Singer Some examples manipulation visual experience total pattern deprivation eg dark rearing selective deprivation certain class patterns eg vertical lines monocular deprivation animals binocular vision etc Extensive studies involving behavioral deficits resulting total visual pattern deprivation indicate deficits arise primarily result impairment visual information processing brain The results experiments suggest specific developmental learning mechanisms may operating various stages development different levels system We discuss hhhhhhhhhhhhhhh This working draft All comments especially constructive criticism suggestions improvement appreciated I indebted Prof James Dannemiller introducing literature infant development Prof Leonard Uhr helpful comments initial draft paper numerous researchers whose experimental work provided basis model outlined paper This research partially supported grants National Science Foundation University Wisconsin Graduate School
This paper studies problem ergodicity transition probability matrices Markovian models hidden Markov models HMMs makes difficult task learning represent longterm context sequential data This phenomenon hurts forward propagation longterm context information well learning hidden state representation represent longterm context depends propagating credit information backwards time Using results Markov chain theory show problem diffusion context credit reduced transition probabilities approach ie transition probability matrices sparse model essentially deterministic The results found paper apply learning approaches based continuous optimization gradient descent BaumWelch algorithm
Modern industry today needs flexible adaptive faulttolerant methods information processing Several applications shown neural networks fulfill requirements In paper application areas neural networks successfully used presented Then kind check list described mentioned different steps applying neural networks The paper finished discussion neural networks projects done research group Interactive Planning Research Center Computer Science FZI
Gas oil pipelines need inspected corrosion defects regular intervals For application Pipetronix GmbH PTX Karlsruhe developed special ultrasonic based probe Based recorded wall thicknesses called pipe pig Research center computer science FZI developed cooperation PTX automatic inspection system called NeuroPipe NeuroPipe task detect defects like metal loss The kernel inspection tool neural classifier trained using manually collected defect examples The following paper focus aspects successfull use learning methods industrial application
We construct mixture locally linear generative models collection pixelbased images digits use recognition Different models given digit used capture different styles writing new images classified evaluating loglikelihoods model We use EMbased algorithm Mstep computationally straightforward principal components analysis PCA Incorporating tangentplane information expected local deformations requires adding tangent vectors sample covariance matrices PCA demonstrably improves performance
In International Journal Neural Systems p URL paper ftpftpcscoloradoedupubTimeSeriesMyPapersexpertspsZ httpwwwcscoloradoeduandreasTimeSeriesMyPapersexpertspsZ University Colorado Computer Science Technical Report CUCS In analysis prediction realworld systems two key problems nonstationarityoften form switching regimes overfitting particularly serious noisy processes This article addresses problems using gated experts consisting nonlinear gating network several also nonlinear competing experts Each expert learns predict conditional mean expert adapts width match noise level regime The gating network learns predict probability expert given input This article focuses case gating network bases decision information inputs This contrasted hidden Markov models decision based previous states ie output gating network previous time step well averaging several predictors In contrast gated experts softpartition input space This article discusses underlying statistical assumptions derives weight update rules compares performance gated experts standard methods three time series computergenerated series obtained randomly switching two nonlinear processes time series Santa Fe Time Series Competition light intensity laser chaotic state daily electricity demand France realworld multivariate problem structure several time scales The main results gating network correctly discovers different regimes process widths associated expert important segmentation task used characterize subprocesses less overfitting compared single networks homogeneous multilayer perceptrons since experts learn match variances local noise levels This viewed matching local complexity model local complexity data
This paper describes approach modelling drug activity using machine learning tools Some experiments modelling quantitative structureactivity relationship QSAR using standard Hansch method machine learning system Golem already reported literature The paper describes results applying two machine learning systems Magnus Assistant Retis data The results achieved machine learning systems better results Hansch method therefore machine learning tools considered promising solving kind problems The given results also illustrate variations performance different machine learning systems applied drug design problem
In paper describe one aspect research project called HIPED addressed problem performing design engineering devices accessing heterogeneous databases The front end HIPED system consisted interactive KRITIK multimodal reasoning system combined case based model based reasoning solve design problem This paper focuses backend processing five types queries received front end evaluated mapping appropriately using facts schemas underlying databases rules establish correspondance among data databases terms relationships equivalence overlap set containment The uniqueness approach stems fact mapping process forgiving query received front end evaluated respect large number possibilities These possibilities encoded form rules consider various ways tokens given query may match relation names attrribute names values underlying tables The approach implemented using CORAL deductive database system rule processing engine
Temporal difference methods solve temporal credit assignment problem reinforcement learning An important subproblem general reinforcement learning learning achieve dynamic goals Although existing temporal difference methods Q learning applied problem take advantage special structure This paper presents DGlearning algorithm learns efficiently achieve dynamically changing goals exhibits good knowledge transfer goals In addition paper shows traditional relaxation techniques applied problem Finally experimental results given demonstrate superiority DG learning Q learning moderately large synthetic nondeterministic domain
In paper prove intractability learning several classes Boolean functions distributionfree model also called Probably Approximately Correct PAC model learning examples These results representation independent hold regardless syntactic form learner chooses represent hypotheses Our methods reduce problems cracking number wellknown publickey cryptosys tems learning problems We prove polynomialtime learning algorithm Boolean formulae deterministic finite automata constantdepth threshold circuits would dramatic consequences cryptography number theory particular algorithm could used break RSA cryptosystem factor Blum integers composite numbers equivalent modulo detect quadratic residues The results hold even learning algorithm required obtain slight advantage prediction random guessing The techniques used demonstrate interesting duality learning cryptography We also apply results obtain strong intractability results approximating gener alization graph coloring fl This research conducted author Harvard University supported AT T Bell Laboratories scholarship Supported grants ONRNK NSFDCR NSFCCR DAALK DARPA AFOSR SERC
We present new method determining consensus sequence DNA fragment assemblies The new method TraceEvidence directly incorporates aligned ABI trace information consensus calculations via previously described representation TraceData Classifications The new method extracts sums evidence indicated representation determine consensus calls Using TraceEvidence method results automatically produced consensus sequences accurate less ambiguous produced standard majority voting methods Additionally improvements achieved less coverage required standard methods using TraceEvidence coverage three error rates low coverage ten sequences
To learned morphology natural language capacity recognize produce words consisting novel combinations familiar morphemes Most recent work acquisition morphology takes perspective production receptive morphology comes first child This paper presents connectionist model acquisition capacity recognize morphologically complex words The model takes sequences phonetic segments inputs maps onto output units representing meanings lexical grammatical morphemes It consists simple recurrent network separate hiddenlayer modules tasks recognizing root grammatical morphemes input word Experiments artificial language stimuli demonstrate model generalizes novel words morphological rules one major types found natural languages version network unassigned hiddenlayer modules learn assign output recognition tasks efficient manner I also argue rules involving reduplication copying portions root network requires separate recurrent subnetworks sequences larger units syllables The network learn develop syllable representations support recognition reduplication also provide basis learning produce well recognize morphologically complex words The model makes many detailed predictions learning difficulty particular morphological rules
This paper presents algorithm combines traditional EBL techniques recent developments inductive logic programming learn effective clause selection rules Prolog programs When control rules incorporated original program significant speedup may achieved The algorithm shown improvement competing EBL approaches several domains Additionally algorithm capable automatically transforming intractable algorithms ones run polynomial time
Neurons ventral stream primate visual system exhibit responses images objects invariant respect natural transformations translation size view Anatomical neurophysiological evidence suggests achieved series hierarchical processing areas In attempt elucidate manner representations established constructed model cortical visual processing seeks parallel many features system specifically multistage hierarchy topologically constrained convergent connectivity Each stage constructed competitive network utilising modified Hebblike learning rule called trace rule incorporates previous well current neuronal activity The trace rule enables neurons learn whatever invariant short time periods eg representation objects objects transform real world The trace rule enables neurons learn statistical invariances objects transformations associating together representations occur close together time We show using trace rule training algorithm model indeed learn produce transformation invariant responses natural stimuli faces
In paper propose recurrent neural networks feedback input units handling two types data analysis problems On one hand scheme used static data input variables missing On hand also used sequential data input variables missing available different frequencies Unlike case probabilistic models eg Gaussian missing variables network attempt model distribution missing variables given observed variables Instead discriminant approach fills missing variables sole purpose minimizing learning criterion eg minimize output error
Many neural networks derived optimization dynamics suitable objective functions We show networks designed repeated transformations one objective another fixpoints We exhibit collection algebraic transformations reduce network cost increase set objective functions neurally implementable The transformations include simplification products expressions functions one two expressions sparse matrix products may interpreted Legendre transformations also minimum maximum set expressions These transformations introduce new interneurons force network seek saddle point rather minimum Other transformations allow control network dynamics reconciling Lagrangian formalism need fixpoints We apply transformations simplify number structured neural networks beginning standard reduction winnertakeall network ON connections ON Also susceptible inexact graphmatching random dot matching convolutions coordinate transformations sorting Simulations show fixpointpreserving transformations may applied repeatedly elaborately example networks still robustly converge
The use casebased reasoning process model design involves subtasks recalling previously known designs memory adapting design cases subcases fit current design context The development process model particular design domain proceeds parallel development representation cases case memory organisation design knowledge needed addition specific designs The selection particular representational paradigm types information details use particular problemsolving domain depend intended use information represented project information available well nature domain In paper describe development implementation four casebased design systems CASECAD CADSYN WIN DEMEX Each system described terms content organisation source case memory implementation case recall case adaptation A comparison systems considers relative advantages disadvantages implementations
We describe biologically plausible model dynamic recognition learning visual cortex based statistical theory Kalman filtering optimal control theory The model utilizes hierarchical network whose successive levels implement Kalman filters operating successively larger spatial temporal scales Each hierarchical level network predicts current visual recognition state lower level adapts recognition state using residual error prediction actual lowerlevel state Simultaneously network also learns internal model spatiotemporal dynamics input stream adapting synaptic weights hierarchical level order minimize prediction errors The Kalman filter model respects key neuroanatomical data reciprocity connections visual cortical areas assigns specific computational roles interlaminar connections known exist neurons visual cortex Previous work elucidated usefulness model explaining neurophysiological phenomena endstopping related extraclassical receptive field effects In paper addition providing detailed exposition model present variety experimental results demonstrating ability model perform robust spatiotemporal segmentation recognition objects image sequences presence varying amounts occlusion background clutter noise
Individual lifetime learning guide evolving population areas high fitness genotype space evolutionary phenomenon known Baldwin effect Baldwin Hinton Nowlan It accepted wisdom guiding speeds rate evolution By highlighting another interaction learning evolution termed Hiding effect argued depends measure evolutionary speed one adopts The Hiding effect shows learning reduce selection pressure individuals hiding genetic differences There thus tradeoff Baldwin effect Hiding effect determine learnings influence evolution two factors contribute tradeoff cost learning landscape epis tasis investigated experimentally
This paper focuses optimization hyperparameters function approximators We describe kind racing algorithm continuous optimization problems spends less time evaluating poor parameter settings time honing estimates promising regions parameter space The algorithm able automatically optimize parameters function approximator less computation time We demonstrate algorithm problem finding good parameters memory based learner show tradeoffs involved choosing right amount computation spend evaluation
Based analysis experiments using realworld datasets find greediness forward feature selection algorithms severely corrupt accuracy function approximation using selected input features improves efficiency significantly Hence propose three greedier algorithms order enhance efficiency feature selection processing We provide empirical results linear regression locally weighted regression knearestneighbor models We also propose use algorithms develop offline Chinese Japanese handwriting recognition system auto matically configured local models
This paper considers aspect mixture modelling Significantly overlapping distributions require data parameters accurately estimated well separated distributions For example two Gaussian distributions considered significantly overlap means within three standard deviations If insufficient data available single component distribution estimated although data originates two component distributions We consider much data required distinguish two component distributions one distribution mixture modelling using minimum message length MML criterion First perform experiments show MML criterion performs well relative Bayesian criteria Second make two improvements existing MML estimates improve performance overlapping distributions
In paper present performance prediction model indicating performance range MIMD parallel processor systems neural network simulations The model expresses total execution time simulation function execution times small number kernel functions measured one processor one physical communication link The functions depend type neural network geometry decomposition connection structure MIMD machine Using model execution time speedup scalability efficiency large MIMD systems predicted The model validated quantitatively applying two popular neural networks backpropagation Kohonen selforganizing feature map decomposed GCel transputer system Measurements taken network simulations decomposed via dataset network decomposition techniques Agreement model measurements within Estimates given performances expected new T transputer systems The presented method also used application areas image processing
With goal reducing computational costs without sacrificing accuracy describe two algorithms find sets prototypes nearest neighbor classification Here term prototypes refers reference instances used nearest neighbor computation instances respect similarity assessed order assign class new data item Both algorithms rely stochastic techniques search space sets prototypes simple implement The first Monte Carlo sampling algorithm second applies random mutation hill climbing On four datasets show three four prototypes sufficed give predictive accuracy equal superior basic nearest neighbor algorithm whose runtime storage costs approximately times greater We briefly investigate random mutation hill climbing may applied select features prototypes simultaneously Finally explain performance sampling algorithm datasets terms statistical measure extent clustering displayed target classes
We present new selforganizing neural network model two variants The first variant performs unsupervised learning used data visualization clustering vector quantization The main advantage existing approaches eg Kohonen feature map ability model automatically find suitable network structure size This achieved controlled growth process also includes occasional removal units The second variant model supervised learning method results combination abovementioned selforganizing network radial basis function RBF approach In model possible contrast earlier approaches toperform positioning RBF units supervised training weights parallel Therefore current classification error used determine insert new RBF units This leads small networks generalize well Results twospirals benchmark vowel classification problem presented better results previously published fl submitted publication
I present first results COLUMBUS autonomous mobile robot COLUMBUS operates initially unknown structured environments Its task explore model environment efficiently avoiding collisions obstacles COLUMBUS uses instancebased learning technique modeling environment Realworld experiences generalized via two artificial neural networks encode characteristics robots sensors well characteristics typical environments robot assumed face Once trained networks allow knowledge transfer across different environments robot face lifetime COLUMBUS models represent expected reward confidence expectations Exploration achieved navigating low confidence regions An efficient dynamic programming method employed background find minimalcost paths executed robot maximize exploration COLUMBUS operates realtime It operating successfully office building environment periods hours
We analyze performance Genetic Algorithm GA call Culling variety algorithms problem refer Additive Search Problem ASP ASP closely related several previously well studied problems game Mastermind additive fitness functions We show problem learning Ising perceptron reducible noisy version ASP Culling efficient ASP highly noise tolerant best known approach regimes Noisy ASP first problem aware Genetic Type Algorithm bests known competitors Standard GAs contrast perform much poorly ASP hillclimbing approaches even though Schema theorem holds ASP We generalize ASP kASP study whether GAs achieve implicit parallelism problem many schemata GAs fail achieve implicit parallelism describe algorithm call Explicitly Parallel Search succeeds We also compute optimal culling point selective breeding turns independent fitness function population distribution We also analyze Mean Field Theoretic algorithm performing similarly Culling many problems These results provide insight GAs beat competing methods
Many extensions proposed help instancebased learning algorithms perform better wide variety realworld applications However trivial decide parameters options use applying instancebased learning algorithm particular problem Traditionally crossvalidation used choose parameters k k nearest neighbor classifier This paper points cross validation often provide enough information allow finetuning classifier confidence levels used break ties common crossvalidation used It proposes Fuzzy Instance Based Learning FIBL algorithm uses distanceweighted voting parameters set via combination crossvalidation confidence levels In experiments datasets FIBL higher average generalization accuracy using majority voting using crossvalidation alone determine parameters
This paper describes formulation reinforcement learning enables learning noisy dynamic environemnts complex concurrent multirobot learning domain The methodology involves minimizing learning space use behaviors conditions dealing credit assignment problem shaped reinforcement form heterogeneous reinforcement functions progress estimators We experimentally validate ap proach group four mobile robots learning foraging task
Most existing decision tree systems use greedy approach induce trees locally optimal splits induced every node tree Although greedy approach suboptimal believed produce reasonably good trees In current work attempt verify belief We quantify goodness greedy tree induction empirically using popular decision tree algorithms C CART We induce decision trees thousands synthetic data sets compare corresponding optimal trees turn found using novel map coloring idea We measure effect greedy induction variables underlying concept complexity training set size noise dimensionality Our experiments show among things expected classification cost greedily induced tree consistently close optimal tree
Report SYCON ABSTRACT Previous results input state stabilizability shown hold even systems linear controls provided general type feedback allowed Applications certain stabilization problems coprime factorizations well comparisons results input state stability also briefly discussed
Attractor networks map continuous input space discrete output space useful pattern completion cleaning noisy missing features input However designing net given set attractors notoriously tricky training procedures CPU intensive often produce spurious attractors illconditioned attractor basins These difficulties occur connection network participates encoding multiple attractors We describe alternative formulation attractor networks encoding knowledge local distributed Although localist attractor nets similar dynamics distributed counterparts much easier work interpret We propose statistical formulation localist attractor net dynamics yields convergence proof mathematical interpretation model parameters We present simulation experiments explore behavior localist attractor nets showing produce gang effectthe presence attractor enhances attractor basins neighboring attractorsand spurious attractors occur points symmetry state space
According Wolperts nofreelunch NFL theorems generalisation absence domain knowledge necessarily zerosum enterprise Good generalisation performance one situation always offset bad performance another Wolpert notes theorems demonstrate effective generalisation logical impossibility merely learners bias assumption set key importance
Learning limited modification parameters limited scope capability modify system structure also needed get wider range learnable In case artificial neural networks learning iterative adjustment synaptic weights succeed network designer predefines appropriate network structure ie number hidden layers units size shape receptive projective fields This paper advocates view network structure usually done determined trialanderror computed learning algorithm Incremental learning algorithms modify network structure addition andor removal units andor links A survey current connectionist literature given line thought Grow Learn GAL new algorithm learns association oneshot due incremental using local representation During socalled sleep phase units previously stored longer necessary due recent modifications removed minimize network complexity The incrementally constructed network later finetuned offline improve performance Another method proposed greatly increases recognition accuracy train number networks vote responses The algorithm variants tested recognition handwritten numerals seem promising especially terms learning speed This makes algorithm attractive online learning tasks eg robotics The biological plausibility incremental learning also discussed briefly Earlier part work realized Laboratoire de Microinformatique Ecole Polytechnique Federale de Lausanne supported Fonds National Suisse de la Recherche Scientifique Later part realized supported International Computer Science Institute A number people helped guiding stimulating discussions questions Subutai Ahmad Peter Clarke Jerry Feldman Christian Jutten Pierre Marchal Jean Daniel Nicoud Steve Omohondro Leon Personnaz
This paper introduces probability model mixture trees account sparse dynamically changing dependence relationships We present family efficient algorithms use EM Minimum Spanning Tree algorithm find ML MAP mixture trees variety priors including Dirichlet MDL priors
Two fundamental problems analyzing DNA sequences locating regions DNA sequence encode proteins determining reading frame region We investigate using artificial neural networks ANNs find coding regions determine reading frames detect frameshift errors E coli DNA sequences We describe adaptation approach used Uberbacher Mural identify coding regions human DNA compare performance ANNs several conventional methods predicting reading frames Our experiments demonstrate ANNs outperform conventional approaches
The paper describes selflearning control system mobile robot Based sensor information control system provide steering signal way collisions avoided Since case examples available system learns basis external reinforcement signal negative case collision zero otherwise Rules Temporal Difference learning used find correct mapping discrete sensor input space steering signal We describe algorithm learning correct mapping input state vector output steering signal algorithm used discrete coding input state space
Work currently underway devise learning methods better able transfer knowledge one task another The process knowledge transfer usually viewed logically separate inductive procedures ordinary learning However paper argues seperatist view leads number conceptual difficulties It offers task analysis situates transfer process inside generalised inductive protocol It argues transfer viewed subprocess within induction independent procedure transporting knowledge learning trials
This chapter describes three studies address question neural network learning improved via incorporation information extracted networks This general problem call network transfer encompasses many types relationships source target networks Our focus utilization weights source networks solve subproblem target network task goal speeding learning target task We demonstrate approach described improve learning speed ten times learning starting random weights
ALVINN Autonomous Land Vehicle Neural Net Backpropagation trained neural network capable autonomously steering vehicle road highway environments Although ALVINN fairly robust one problems time takes train As vehicle capable online learning driver drive car minutes network capable autonomous operation One reason use Backprop In report describe original ALVINN system look three alternative training methods Quickprop Cascade Correlation Cascade We run series trials using Quickprop Cascade Correlation Cascade compare BackProp baseline Finally hidden unit analysis performed determine network learning Applying Advanced Learning Algorithms ALVINN
The work discussed paper motivated need building decision support systems realworld problem domains Our goal use systems tool supporting Bayes optimal decision making action maximizing expected utility respect predicted probabilities possible outcomes selected For reason models used need probabilistic nature output model probability distribution set numbers For model family chosen set simple discrete finite mixture models advantage computationally efficient In work describe Bayesian approach constructing finite mixture models sample data Our approach based twophase unsupervised learning process used exploratory analysis model construction In first phase selection model class ie number parameters performed calculating CheesemanStutz approximation model class evidence In second phase MAP parameters selected class estimated EM algorithm In framework overfitting problem common many traditional learning approaches avoided learning process automatically regulates complexity model This paper focuses model class selection phase approach validated presenting empirical results natural synthetic data
We introduce analyze new algorithm linear classification combines Rosenblatts perceptron algorithm Helmbold Warmuths leaveoneout method Like Vapniks maximalmargin classifier algorithm takes advantage data linearly separable large margins Compared Vapniks algorithm however much simpler implement much efficient terms computation time We also show algorithm efficiently used high dimensional spaces using kernel functions We performed experiments using algorithm variants classifying images handwritten digits The performance algorithm close good performance maximalmargin classifiers problem
A version paper appear ACM Transactions Computer Systems August Permission make digital copies part work personal classroom use grantedwithout fee provided copies made distributed profit commercial advantage copies bear notice full citation first page Copyrights components work owned others ACM must honored Abstracting credit permitted To copy otherwise republish post servers redistribute lists requires prior specific permission andor fee Abstract To achieve high performance contemporary computer systems rely two forms parallelism instructionlevel parallelism ILP threadlevel parallelism TLP Wideissue superscalar processors exploit ILP executing multiple instructions single program single cycle Multiprocessors MP exploit TLP executing different threads parallel different processors Unfortunately parallelprocessing styles statically partition processor resources thus preventing adapting dynamicallychanging levels ILP TLP program With insufficient TLP processors MP idle insufficient ILP multipleissue hardware superscalar wasted This paper explores parallel processing alternative architecture simultaneous multithreading SMT allows multiple threads compete share processors resources every cycle The compelling reason running parallel applications SMT processor ability use threadlevel parallelism instructionlevel parallelism interchangeably By permitting multiple threads share processors functional units simultaneously processor use ILP TLP accommodate variations parallelism When program single thread SMT processors resources dedicated thread TLP exists parallelism compensate lack
In paper present framework building probabilistic automata parameterized contextdependent probabilities Gibbs distributions used model state transitions output generation parameter estimation carried using EM algorithm Mstep uses generalized iterative scaling procedure We discuss relations certain classes stochastic feedforward neural networks geometric interpretation parameter estimation simple example statistical language model constructed using methodology
In regression context boosting bagging techniques build committee regressors may superior single regressor We use regression trees fundamental building blocks bagging committee machines boosting committee machines Performance analyzed three nonlinear functions Boston housing database In cases boosting least equivalent cases better bagging terms prediction error
Current expert systems properly handle imprecise incomplete information On hand neural networks perform pattern recognition operations even noisy environments Against background implemented neural expert system shell NEULA whose computational mechanism processes imprecisely incompletely given information means approximate probabilistic reasoning
Coevolution give rise Red Queen effect interacting populations alter others fitness landscapes The Red Queen effect significantly complicates measurement coevolutionary progress introducing fitness ambiguities improvements performance coevolved individuals appear decline stasis usual measures evolutionary progress Unfortunately appropriate measures fitness given Red Queen effect developed artificial life theoretical biology population dynamics evolutionary genetics We propose set appropriate performance measures based genetic behavioral data illustrate use simulation coevolution genetically specified continuoustime noisy recurrent neural networks generate pursuit evasion behaviors autonomous agents
Inferences measurement error models sensitive modeling assumptions Specifically model incorrect estimates inconsistent To reduce sensitivity modeling assumptions yet still retain efficiency parametric inference propose use flexible parametric models accommodate departures standard parametric models We use mixtures normals purpose We study two cases detail linear errorsinvariables model changepoint Berkson model fl Raymond J Carroll Professor Statistics Nutrition Toxicology Department Statistics Texas AM University College Station TX Kathryn Roeder Associate Professor Larry Wasserman Professor Department Statistics CarnegieMellon University Pittsburgh PA Carrolls research supported grant National Cancer Institute CA Roeders research supported NSF grant DMS Wassermans research supported NIH grant ROCA NSF grants DMS DMS
In paper investigate phenomenon multiparent reproduction ie study recombination mechanisms arbitrary n gt number parents participate creating children In particular discuss scanning crossover generalizes standard uniform crossover diagonal crossover generalizes point crossover study effects different number parents GA behavior We conduct experiments tough function optimization problems observe multiparent operators performance GAs enhanced significantly We also give theoretical foundation showing operators work distributions
TECHNICAL REPORT No Department Statistics GN University Washington Seattle Washington USA Susan L Rosenkranz Pew Health Policy Postdoctoral Fellow Institute Health Policy Studies Box University California San Francisco San Francisco CA Adrian E Raftery Professor Statistics Sociology Department Statistics GN University Washington Seattle WA Rosenkranzs research supported National Research Service Award TCA National Cancer Institute The authors grateful Paula Diehr Kevin Cain helpful discussions
Draft A Brief Introduction Neural Networks Richard D De Veaux Lyle H Ungar Williams College University Pennsylvania Abstract Artificial neural networks used increasing frequency high dimensional problems regression classification This article provides tutorial overview neural networks focusing back propagation networks method approximating nonlinear multivariable functions We explain statisticians vantage point neural networks might attractive compare modern regression techniques KEYWORDS nonparametric regression function approximation backpropagation Introduction Networks mimic way brain works computer programs actually LEARN patterns forecasting without know statistics These many claims attractions artificial neural networks Neural networks henceforth drop term artificial unless need distinguish biological neural networks seem everywhere days least advertising able statistics without fuss bother anything except buy piece software Neural networks successfully used many different applications including robotics chemical process control speech recognition optical character recognition credit card fraud detection interpretation chemical spectra vision autonomous navigation vehicles Pointers literature given end article In article attempt explain one particular type neural network feedforward networks sigmoidal activation functions backpropagation networks actually works trained compares well known statistical techniques As example someone would want use neural network consider problem recognizing hand written ZIP codes letters This classification problem
To apply algorithm classification assign class separate set codebook Gaussians Each set trained patterns single class After trained codebook Gaussians set provides estimate probability function one class Parzen window estimation take estimate pattern distribution average Gaussians set Classification pattern may done calculating probability class respective sample point assigning pattern class highest probability Hence whole codebook plays role classification patterns This case regular classification schemes using codebooks We tested classification scheme several classification tasks including two spiral problem We compared algorithm various classification algorithms came second best algorithm applications Parzen window estimation However computing time memory Parzen window estimation excessive compared algorithm hence practical situations algorithm preferred We developed fast algorithm combines attractive properties Parzen window estimation vector quantization The scale parameter tuned adaptively therefore set ad hoc manner It allows classification strategy codebook vectors taken account This yields better results standard vector quantization techniques An interesting topic research use radially nonsymmetric Gaussians
Predictions lifetimes dynamically allocated objects used improve time space efficiency dynamic memory management computer programs Barrett Zorn used simple lifetime predictor demonstrated improvement variety computer programs In paper use decision trees lifetime prediction programs show significantly better prediction Our method also advantage training use large number features let decision tree automatically choose relevant subset
Evolutionary systems used variety applications turbine design scheduling problems The basic algorithms similar applications representation always problem specific Unfortunately search time evolutionary systems much depends efficient codings using problem specific domain knowledge reduce size search space This paper describes approach user specifies general basic coding used larger variety problems The system learns efficient problem specific coding To evolutionary system variable length coding used While system optimizes example problem meta process identifies successful combinations genes population combines higher level evolved genes The extraction repeated iteratively allowing genes evolve high level complexity encode high number original basic genes This results continuous restructuring search space allowing potentially successful solutions found much shorter search time The evolved coding used solve related problems While excluding potentially desirable solutions evolved coding makes knowledge example problem available new problem
The coverage learning algorithm number concepts learned algorithm samples given size This paper asks whether good learning algorithms designed maximizing coverage The paper extends previous upper bound coverage Boolean concept learning algorithm describes two algorithmsMultiBalls LargeBallwhose coverage approaches upper bound Experimental measurement coverage ID FRINGE algorithms shows coverage far bound Further analysis LargeBall shows although learns many concepts seem interesting concepts Hence coverage maximization alone appear yield practicallyuseful learning algorithms The paper concludes definition coverage within bias suggests way coverage maximization could applied strengthen weak preference biases
Markov decision processes MDPs recently applied problem modeling decisiontheoretic planning While traditional methods solving MDPs often practical small states spaces effectiveness large AI planning problems questionable We present algorithm called structured policy iteration SPI constructs optimal policies without explicit enumeration state space The algorithm retains fundamental computational steps commonly used modified policy iteration algorithm exploits variable propositional independencies reflected temporal Bayesian network representation MDPs The principles behind SPI applied structured representation stochastic actions policies value functions algorithm used conjunction cent approximation methods
This paper reviews features new class multilayer connectionist architectures known ASOCS Adaptive SelfOrganizing Concurrent Systems ASOCS similar decisionmaking neural network models attempts learn adaptive set arbitrary vector mappings However differs dramatically mechanisms ASOCS based networks adaptive digital elements selfmodify using local information Function specification entered incrementally use rules rather complete inputoutput vectors processing network able extract critical features large environment give output parallel fashion Learning also uses parallelism selforganization new rule completely learned time linear depth network The model guarantees learning arbitrary mapping boolean inputoutput vectors The model also stable learning erase previously learned mappings except explicitly contradicted
Maximum working likelihood MWL inference presence missing data quite challenging intractability associated marginal likelihood This problem exacerbated number parameters involved large We propose using Markov chain Monte Carlo MCMC first obtain MWL estimator working Fisher information matrix second using Monte Carlo quadrature obtain remaining components correct asymptotic MWL variance Evaluation marginal likelihood needed We demonstrate consistency asymptotic normality number independent identically distributed data clusters large likelihood may incorrectly specified An analysis longitudinal ordinal data given example KEY WORDS Convergence posterior distributions Maximum likelihood Metropolis
Natural images contain characteristic statistical regularities set apart purely random images Understanding regularities enable natural images coded efficiently In paper describe forms structure contained natural images show related response properties neurons early stages visual system Many important forms structure require higherorder ie linear pairwise statistics characterize makes models based linear Hebbian learning principal components analysis inappropriate finding efficient codes natural images We suggest good objective efficient coding natural scenes maximize sparseness representation show network learns sparse codes natural scenes succeeds developing localized oriented bandpass receptive fields similar primate striate cortex
In paper develop empirical methodology studying behavior evolutionary algorithms based problem generators We describe three generators used study effects epistasis performance EAs Finally illustrate use ideas preliminary exploration effects epistasis simple GAs
Traditionally genetic algorithms relied upon point crossover operators Many recent empirical studies however shown benefits higher numbers crossover points Some intriguing recent work focused uniform crossover involves average L crossover points strings length L Theoretical results suggest view hyperplane sampling disruption uniform crossover redeeming features However growing body experimental evidence suggests otherwise In paper attempt reconcile opposing views uniform crossover present framework understanding virtues
Conditional logics introduced Lewis Stalnaker utilized artificial intelligence capture broad range phenomena In paper examine complexity several variants discussed literature We show general deciding satisfiability PSPACEcomplete formulas arbitrary conditional nesting NPcomplete formulas bounded nesting conditionals However provide several exceptions rule Of particular note results showing assuming uniformity ie worlds agree worlds possible decision problem becomes EXPTIMEcomplete even formulas bounded nesting b assuming absoluteness ie worlds agree conditional statements decision problem NPcomplete mulas arbitrary nesting
An incremental higherorder nonrecurrent network combines two properties found useful learning sequential tasks higherorder connections incremental introduction new units The network adds higher orders needed adding new units dynamically modify connection weights Since new units modify weights next timestep information previous step temporal tasks learned without use feedback thereby greatly simplifying training Furthermore theoretically unlimited number units added reach arbitrarily distant past Experiments Reber grammar demonstrated speedups two orders magnitude recurrent networks
I propose novel general principle unsupervised learning distributed nonredundant internal representations input patterns The principle based two opposing forces For representational unit adaptive predictor tries predict unit remaining units In turn unit tries react environment minimizes predictability This encourages unit filter abstract concepts environmental input concepts statistically independent upon units focus I discuss various simple yet potentially powerful implementations principle aim finding binary factorial codes Barlow et al ie codes probability occurrence particular input simply product probabilities corresponding code symbols Such codes potentially relevant segmentation tasks speeding supervised learning novelty detection Methods finding factorial codes automatically implement Occams razor finding codes using minimal number units Unlike previous methods novel principle potential removing linear also nonlinear output redundancy Illustrative experiments show algorithms based principle predictability minimization practically feasible The final part paper describes entirely local algorithm potential learning unique representations extended input sequences
In paper study learning PAC model Valiant example oracle used learning may faulty one two ways either misclassifying example distorting distribution examples We first consider models examples misclassified Kearns recently showed efficient learning new model using statistical queries sufficient condition PAC learning classification noise We show efficient learning statistical queries sufficient learning PAC model malicious error rate proportional required statistical query accuracy One application result new lower bound tolerable malicious error learning monomials k literals This first bound independent number irrelevant attributes n We also use statistical query model give sufficient conditions using distribution specific algorithms distributions outside prescribed domains A corollary result expands class distributions weakly learn monotone Boolean formulae We also consider new models learning examples chosen according distribution learner tested We examine three variations distribution noise give necessary sufficient conditions polynomial time learning noise We show containments separations various models faulty oracles Finally examine hypothesis boosting algorithms context learning distribution noise show Schapires result regarding strength weak learnability sense tight requiring weak learner nearly distribution free
This paper describes first stage study evolution learning abilities We use simple maze exploration problem designed R Sutton task individual encode inherent learning parameters genome The learning architecture use one step Qlearning using lookup table inherent parameters initial Qvalues learning rate discount rate rewards exploration rate Under fitness measure proportioning number times achieves goal later half life learners evolve genetic algorithm The results computer simulation indicated learning ability emerge environment changes every generation inherent map optimal path acquired environment doesnt change These results suggest emergence learning ability needs environmental change faster alternate generation
We examine problem performing exact dynamicprogramming updates partially observable Markov decision processes pomdps computational complexity viewpoint Dynamicprogramming updates crucial operation wide range pomdp solution methods find intractable perform updates piecewiselinear convex value functions general pomdps We offer new algorithm called witness algorithm compute updated value functions efficiently restricted class pomdps number linear facets great We compare witness algorithm existing algorithms analytically empirically find fastest algorithm wide range pomdp sizes
This paper examines limits instruction level parallelism found programs particular SPEC benchmark suite Apart using recent version SPEC benchmark suite differs earlier studies removing nonessential true dependencies occur result compiler employing stack subroutine linkage This subtle limitation parallelism readily evident appears true dependency stack pointer Other methods used employ stack remove dependency In paper show removal exposes far parallelism seen previously We refer type parallelism parallelism distance requires impossibly large instruction windows detection We conclude two observations single instruction window characteristic superscalar machines inadequate detecting parallelism distance order take advantage parallelism compiler must involved separate threads must explicitly programmed
In paper present framework building probabilistic automata parameterized contextdependent probabilities Gibbs distributions used model state transitions output generation parameter estimation carried using EM algorithm Mstep uses generalized iterative scaling procedure We discuss relations certain classes stochastic feedforward neural networks geometric interpretation parameter estimation simple example statistical language model constructed using methodology
Models unsupervised correlationbased Hebbian synaptic plasticity typically unstable either synapses grow reaches maximum allowed strength synapses decay zero strength A common method avoiding outcomes use constraint conserves limits total synaptic strength cell We study dynamical effects constraints Two methods enforcing constraint distinguished multiplicative subtractive For otherwise linear learning rules multiplicative enforcement constraint results dynamics converge principal eigenvector operator determining unconstrained synaptic development Subtractive enforcement contrast typically leads final state almost synaptic strengths reach either maximum minimum allowed value This final state often dominated weight configurations principal eigenvector unconstrained operator Multiplicative enforcement yields graded receptive field mutually correlated inputs represented whereas subtractive enforcement yields receptive field sharpened subset maximallycorrelated inputs If two equivalent input populations eg two eyes innervate common target multiplicative enforcement prevents segregation ocular dominance segregation two populations weakly correlated whereas subtractive enforcement allows segregation circumstances These results may used understand constraints output cells input cells A variety rules implement constrained dynamics discussed
This project supported part grant McDonnellPew Foundation grant ATR Human Information Processing Research Laboratories grant Siemens Corporation grant NJ Office Naval Research The project also supported NSF grant ASC support Center Biological Computational Learning MIT including funds provided DARPA HPCC program Michael I Jordan NSF Presidential Young Investigator
Learning made efficient actively select particularly salient data points Within Bayesian learning framework objective functions discussed measure expected informativeness candidate measurements Three alternative specifications want gain information lead three different criteria data selection All criteria depend assumption hypothesis space correct may prove main weakness
Knowledge clusters relations important understanding highdimensional input data unknown distribution Ordinary feature maps fully connected fixed grid topology properly reflect structure clusters input spacethere cluster boundaries map Incremental feature map algorithms nodes connections added deleted map according input distribution overcome problem However far algorithms limited maps drawn D case dimensional input space In approach proposed paper nodes added incrementally regular dimensional grid drawable times irrespective dimensionality input space The process results map explicitly represents cluster structure highdimensional input
Lattice conditional independence LCI models multivariate normal data recently introduced analysis nonmonotone missing data patterns nonnested dependent linear regression models seemingly unrelated regressions It shown class LCI models coincides subclass class graphical Markov models determined acyclic digraphs ADGs namely subclass transitive ADG models An explicit graph theoretic characterization ADGs Markov equivalent transitive ADG obtained This characterization allows one determine whether specific ADG D Markov equivalent transitive ADG hence LCI model polynomial time without exhaustive search exponentially large equivalence class D These results require existence positivity joint densities
In paper describe method improving geneticalgorithmbased optimization using casebased learning The idea utilize sequence points explored search guide exploration The proposed method particularly suitable continuous spaces expensive evaluation functions arise engineering design Empirical results two engineering design domains across different representations demonstrate proposed method significantly improve efficiency reliability GA optimizer Moreover results suggest modification makes genetic algorithm less sensitive poor choices tuning parameters muta tion rate
Genetic algorithms GAs extensively used means performing global optimization simple yet reliable manner However realistic engineering design optimization domains simple classical implementation GA based binary encoding bit mutation crossover often inefficient unable reach global optimum In paper describe GA continuous designspace optimization uses new GA operators strategies tailored structure properties engineering design domains Empirical results domains supersonic transport aircraft supersonic missile inlets demonstrate newly formulated GA significantly better classical GA efficiency reliability
D E Rumelhart G E Hinton R J Williams Learning Internal Representations Error Propagation D E Rumelhart J L McClelland eds Parallel Distributed Processing Explorations Microstructure Cognition Vol MIT Press
Evolutionary trees frequently used underlying model design algorithms optimization criteria software packages multiple sequence alignment MSA In paper reexamine suitability trees universal model MSA light broad range biological questions MSAs used address A tree model consists tree topology model accepted mutations along branches After surveying major applications MSA examples molecular biology literature used illustrate situations tree model fails This occurs relationship residues column described tree example structural functional applications MSA It also occurs situations lateral gene transfer entire gene modeled unique tree In cases nonparsimonous data convergent evolution may difficult find consistent mutational model We hope survey promote dialogue biologists computer scientists leading biologically realistic research MSA
Selective suppression transmission feedback synapses learning proposed mechanism combining associative feedback selforganization feedforward synapses Experimental data demonstrates cholinergic suppression synaptic transmission layer I feedback synapses lack suppression layer IV feedforward synapses A network feature uses local rules learn mappings linearly separable During learning sensory stimuli desired response simultaneously presented input Feedforward connections form selforganized representations input suppressed feedback connections learn transpose feedforward connectivity During recall suppression removed sensory input activates selforganized representation activity generates learned response
Technical Report No Department Statistics University Toronto Abstract One way sample distribution sample uniformly region plot density function A Markov chain converges uniform distribution constructed alternating uniform sampling vertical direction uniform sampling horizontal slice defined current vertical position Variations slice sampling methods easily implemented univariate distributions used sample multivariate distribution updating variable turn This approach often easier implement Gibbs sampling may efficient easilyconstructed versions Metropolis algorithm Slice sampling therefore attractive routine Markov chain Monte Carlo applications use software automatically generates Markov chain sampler model specification One also easily devise overrelaxed versions slice sampling sometimes greatly improve sampling efficiency suppressing random walk behaviour Random walks also avoided slice sampling schemes simultaneously update variables
Markov decision problems MDPs provide foundations number problems interest AI researchers studying automated planning reinforcement learning In paper summarize results regarding complexity solving MDPs running time MDP solution algorithms We argue although MDPs solved efficiently theory study needed reveal practical algorithms solving large problems quickly To encourage future research sketch alternative methods analysis rely struc ture MDPs
Learning reinforcements promising approach creating intelligent agents However reinforcement learning usually requires large number training episodes We present evaluate design addresses shortcoming allowing connectionist Qlearner accept advice given time natural manner external observer In approach advicegiver watches learner occasionally makes suggestions expressed instructions simple imperative programming language Based techniques knowledgebased neural networks insert programs directly agents utility function Subsequent reinforcement learning integrates refines advice We present empirical evidence investigates several aspects approach show given good advice learner achieve statistically significant gains expected reward A second experiment shows advice improves expected reward regardless stage training given another study demonstrates subsequent advice result gains reward Finally present experimental results indicate method powerful naive technique making use advice
This paper presents mathematical foundations Dirichlet mixtures used improve database search results homologous sequences variable number sequences protein family domain known We present method condensing information protein database mixture Dirichlet densities These mixtures designed combined observed amino acid frequencies form estimates expected amino acid probabilities position profile hidden Markov model statistical model These estimates give statistical model greater generalization capacity remotely related family members reliably recognized model Dirichlet mixtures shown outperform substitution matrices methods computing expected amino acid distributions database search resulting fewer false positives false negatives families tested This paper corrects previously published formula estimating expected probabilities contains complete derivations Dirichlet mixture formulas methods optimizing mixtures match particular databases suggestions efficient implementation
Derivational analogy technique reusing problem solving experience improve problem solving performance This research addresses issue common problem solvers use derivational analogy overcoming mismatches past experiences new problems impede reuse First research describes variety mismatches arise proposes new approach derivational analogy uses appropriate adaptation strategies Second compares approach seven others common domain This empirical study shows derivational analogy almost always efficient problem solving scratch amount contributes depends ability overcome mismatches
Pollack demonstrated secondorder recurrent neural networks act dynamical recognizers formal languages trained positive negative examples observed phase transitions learning IFSlike fractal state sets Followon work focused mainly extraction minimization finite state automaton FSA trained network However networks capable inducing languages regular therefore equivalent FSA Indeed may simpler small network fit training data inducing nonregular language But networks language regular In paper using low dimensional network capable learning Tomita data sets present empirical method testing whether language induced network regular We also provide detailed machine analysis trained networks regular nonregular languages
COINS Technical Report January Abstract This article presents algorithm inducing multiclass decision trees multivariate tests internal decision nodes Each test constructed training linear machine eliminating variables controlled manner Empirical results demonstrate algorithm builds small accurate trees across variety tasks
Funes P Pollack J Computer Evolution Buildable Objects Fourth European Conference Artificial Life P Husbands I Harvey eds MIT Press pp knowledge program would result familiar structures provided algorithm model physical reality purely utilitarian fitness function thus supplying measures feasibility functionality In way evolutionary process runs environment unnecessarily constrained We added however requirement computability reject overly complex structures took long simulations evaluate The results encouraging The evolved structures surprisingly alien look based common knowledge build brick toys instead computer found ways evolutionary search process We able assemble final designs manually confirm accomplish objectives introduced fitness functions After background related problems describe physical simulation model twodimensional Lego structures representation encoding applying evolution We demonstrate feasibility work photos actual objects result particular optimizations Finally discuss future work draw conclusions In order evolve morphology behavior autonomous mechanical devices manufactured one must simulator operates several constraints resultant controller adaptive enough cover gap simulated real world eral space mechanisms Conservative simulation never perfect preserve margin safety Efficient quicker test simulation physical production test Buildable results convertible simula tion real object Computer Evolution Buildable Objects Abstract The idea coevolution bodies brains becoming popular little work done evolution physical structure lack general framework Evolution creatures simulation constrained reality gap implies resultant objects usually buildable The work present takes step problem body evolution applying evolutionary techniques design structures assembled parts Evolution takes place simulator designed computes forces stresses predicts failure dimensional Lego structures The final printout program schematic assembly built physically We demonstrate functionality several different evolved entities
In paper concerned problem acquiring knowledge integration Our aim construct integrated knowledge base several separate sources The need merge knowledge bases arise example knowledge bases acquired independently interactions several domain experts As opinions different domain experts may differ knowledge bases constructed way normally differ A similar problem also arise whenever separate knowledge bases generated learning algorithms The objective integration construct one system exploits knowledge available good performance The aim paper discuss methodology knowledge integration describe implemented system INTEG present concrete results demonstrate advantages method
Many arthropods particularly insects exhibit sophisticated visually guided behaviours Yet cases behaviours guided input hundreds thousands pixels ie ommatidia compound eye Inspired observation several years exploring possibilities visually guided robots lowbandwidth vision Rather design robot controllers hand use artificial evolution form extended genetic algorithm automatically generate architectures artificial neural networks generate effective sensorymotor coordination controlling mobile robots Analytic techniques drawn neuroethology dynamical systems theory allow us understand evolved robot controllers function predict behaviour environments used evolutionary process Initial experiments performed simulation techniques successfully transferred work variety real physical robot platforms This chapter reviews past work concentrating analysis evolved controllers gives overview current research We conclude discussion application evolutionary techniques problems biological vision
The major implementational problem reversible jump MCMC commonly natural way choose jump proposals since Euclidean structure guide choice In paper consider mechanism guiding proposal choice analysis acceptance probabilities jumps Essentially method involves approximation acceptance probability around certain canonical jumps We illustrate procedure using example reversible jump MCMC application involving Bayesian analysis graphical gaussian models
Instancebased learning methods explicitly remember data receive They usually training phase prediction time perform computation Then take query search database similar datapoints build online local model local average local regression predict output value In paper review advantages instance based methods autonomous systems also note ensuing cost hopelessly slow computation database grows large We present evaluate new way structuring database new algorithm accessing maintains advantages instancebased learning Earlier attempts combat cost instancebased learning sacrificed explicit retention data applicable instancebased predictions based small number near neighbors reintroduce explicit training phase form interpolative data structure Our approach builds multiresolution data structure summarize database experiences resolutions interest simultaneously This permits us query database exibility conventional linear search greatly reduced computational cost
In standard online model learning algorithm tries minimize total number mistakes made series trials On trial learner sees instance either accepts rejects instance told appropriate response We define natural variant model apple tasting learner gets feedback instance accepted We use two transformations relate apple tasting model enhanced standard model false acceptances counted separately false rejections We present strategy trading false acceptances false rejections standard model From one perspective strategy exactly optimal including constants We apply results obtain good general purpose apple tasting algorithm well nearly optimal apple tasting algorithms variety standard classes conjunctions disjunctions n boolean variables We also present analyze simpler transformation useful instances drawn random rather selected adversary
In paper describe algorithm exploits error distribution generated learning algorithm order break domain approximated piecewise learnable partitions Traditionally error distribution neglected favor lump error measure RMS By however lose lot important information The error distribution tells us algorithm badly exists ridge errors also tells us partition space one part space interfere learning another The algorithm builds variable arity kd tree whose leaves contain partitions Using tree new points predicted using correct partition traversing tree We instantiate algorithm using memory based learners crossvalidation
PREENS Parallel Research Execution Environment Neural Systems distributed neurosimulator targeted networks workstations transputer systems As current applications neural networks often contain large amounts data neural networks involved tasks vision large high requirements memory computational resources imposed target execution platforms PREENS executed distributed environment ie tools neural network simulation programs running machine connectable via TCPIP Using approach larger tasks data examined using efficient coarse grained parallelism Furthermore design PREENS allows neural networks running high performance MIMD machine transputer system In paper different features design concepts PREENS discussed These also used applications like image processing
It well known standard learning classifier systems applied many different domains exhibit number problems payoff oscillation difficult regulate interplay reward system background genetic algorithm GA rule chains instability default hierarchies instability ALECSYS parallel version standard learning classifier system CS suffers problems In paper propose innovative solutions problems We introduce following original features Mutespec new genetic operator used specialize potentially useful classifiers Energy quantity introduced measure global convergence order apply genetic algorithm system close steady state Dynamical adjustment classifiers set cardinality order speed performance phase algorithm We present simulation results experiments run simulated twodimensional world simple agent learns follow light source
Supervised neural networks generalize well much less information weights output vectors training cases So learning important keep weights simple penalizing amount information contain The amount information weight controlled adding Gaussian noise noise level adapted learning optimize tradeoff expected squared error network amount information weights We describe method computing derivatives expected squared error amount information noisy weights network contains layer nonlinear hidden units Provided output units linear exact derivatives computed efficiently without timeconsuming Monte Carlo simulations The idea minimizing amount information required communicate weights neural network leads number interesting schemes encoding weights
There many applications desirable order rather classify instances Here consider problem learning order given feedback form preference judgments ie statements effect one instance ranked ahead another We outline twostage approach one first learns conventional means preference function form PREFu v indicates whether advisable rank u v New instances ordered maximize agreements learned preference function We show problem finding ordering agrees best preference function NPcomplete even restrictive assumptions Nevertheless describe simple greedy algorithm guaranteed find good approximation We discuss online learning algorithm based Hedge algorithm finding good linear combination ranking experts We use ordering algorithm combined online learning algorithm find combination search experts domainspecific query expansion strategy WWW search engine present experimental results demonstrate merits approach
Technical Report CSRP March Abstract Evolutionary algorithms powerful techniques optimisation whose operation principles inspired natural selection genetics In paper discuss relation evolutionary techniques numerical classical search methods show methods instances single general search strategy call evolutionary computation cookbook By combining features classical evolutionary methods different ways new instances general strategy generated ie new evolutionary classical algorithms designed One algorithm GA fl described
We present neural net architecture discover hierarchical recursive structure symbol strings To detect structure multiple levels architecture capability reducing symbols substrings single symbols makes use external stack memory In terms formal languages architecture learn parse strings LR contextfree grammar Given training sets positive negative exemplars architecture trained recognize many different grammars The architecture one layer modifiable weights allowing Many cognitive domains involve complex sequences contain hierarchical recursive structure eg music natural language parsing event perception To illustrate spider ate hairy fly noun phrase containing embedded noun phrase hairy fly Understanding multilevel structures requires forming reduced descriptions Hinton string symbols states hairy fly reduced single symbolic entity noun phrase We present neural net architecture learns encode structure symbol strings via reduction transformations The difficult problem extracting multilevel structure complex extended sequences studied Mozer Ring Rohwer Schmidhuber among others While previous efforts made straightforward interpretation behavior
Selforganizing feature maps usually implemented abstracting lowlevel neural parallel distributed processes An external supervisor finds unit whose weight vector closest Euclidian distance input vector determines neighborhood weight adaptation The weights changed proportional Euclidian distance In biologically plausible implementation similarity measured scalar product neighborhood selected lateral inhibition weights changed redistributing synaptic resources The resulting selforganizing process quite similar abstract case However process somewhat hampered boundary effects parameters need carefully evolved It also necessary add redundant dimension input vectors
The application decision making learning algorithms multiagent systems presents many interestingresearch challenges opportunities Among ability agents learn act observing imitating agents We describe algorithm IQalgorithm integrates imitation Qlearning Roughly Qlearner uses observations made expert agent bias exploration promising directions This algorithm goes beyond previous work direction relaxing oftmade assumptions learner observer expert observed agent share objectives abilities Our preliminary experiments demonstrate significant transfer agents using IQmodel many cases reductions training time
Faces represent complex multidimensional meaningful visual stimuli developing computational model face recognition difficult We present hybrid neural network solution compares favorably methods The system combines local image sampling selforganizing map neural network convolutional neural network The selforganizing map provides quantization image samples topological space inputs nearby original space also nearby output space thereby providing dimensionality reduction invariance minor changes image sample convolutional neural network provides partial invariance translation rotation scale deformation The convolutional network extracts successively larger features hierarchical set layers We present results using KarhunenLoeve transform place selforganizing map multilayer perceptron place convolutional network The KarhunenLoeve transform performs almost well error versus The multilayer perceptron performs poorly error versus The method capable rapid classification requires fast approximate normalization preprocessing consistently exhibits better classification performance eigenfaces approach database considered number images per person training database varied With images per person proposed method eigenfaces result error respectively The recognizer provides measure confidence output classification error approaches zero rejecting examples We use database images individuals contains quite high degree variability expression pose facial details We analyze computational complexity discuss new classes could added trained recognizer
We present new algorithm solving Markov decision problems extends modified policy iteration algorithm Puterman Shin two important ways The new algorithm asynchronous allows values states updated arbitrary order need consider actions state updating policy The new algorithm converges general initial conditions required modified policy iteration Specifically set initial policyvalue function pairs algorithm guarantees convergence strict superset set modified policy iteration converges This generalization obtained making simple easily implementable change policy evaluation operator used updating value function Both asynchronous nature algorithm convergence general conditions expand range problems algorithm applied
Recently Markov chain Monte Carlo MCMC sampling methods become widely used determining properties posterior distribution Alternative Gibbs sampler elaborate HitandRun sampler generalization blackbox sampling scheme generate timereversible Markov chain posterior distribution The proof convergence applications Bayesian computation constrained parameter spaces provided comparisons MCMC samplers made In addition propose importance weighted marginal density estimation IWMDE method An IWMDE obtained averaging many dependent observations ratio full joint posterior densities multiplied weighting conditional density w The asymptotic properties IWMDE guidelines choosing weighting conditional density w also considered The generalized version IWMDE estimating marginal posterior densities full joint posterior density contains analytically intractable normalizing constants developed Furthermore develop Monte Carlo methods based KullbackLeibler divergences comparing marginal posterior density estimators This article summary authors PhD thesis presented Savage Award session
In paper characterize complexity noisetolerant learning PAC model Specifically show general lower bound logffi number examples required PAC learning presence classification noise Combined result Simon effectively show sample complexity PAC learning presence classification noise VCF Furthermore demonstrate optimality general lower bound providing noisetolerant learning algorithm class symmetric Boolean functions uses sample size within constant factor bound Finally note general lower bound compares favorably various general upper bounds PAC learning presence classification noise
It recently realized parasite virulence harm caused parasites hosts adaptive trait Selection particular level virulence happen either level betweenhost tradeoffs result shortsighted withinhost competition This paper describes simulations study effect modifier genes changes mutation rate suppressing shortsighted development virulence investigates interaction simplified model im mune clearance
Much work qualitative physics involves constructing models physical systems using functional descriptions flow monotonically increases pressure Semiquantitative methods improve model precision adding numerical envelopes monotonic functions Ad hoc methods normally used determine envelopes This paper describes systematic method computing bounding envelope multivariate monotonic function given stream data The derived envelope computed determining simultaneous confidence band special neural network guaranteed produce monotonic functions By composing envelopes complex systems simulated using semiquantitative methods
In paper describe application MemoryBased Learning problem Prepositional Phrase attachment disambiguation We compare MemoryBased Learning stores examples memory generalizes using intelligent similarity metrics number recently proposed statistical methods well suited large numbers features We evaluate methods common benchmark dataset show method compares favorably previous methods wellsuited incorporating various unconventional representations word patterns value difference metrics Lexical Space
Hierarchically structured mixture models studied context data analysis inference neural synaptic transmission characteristics mammalian central nervous systems Mixture structures arise due uncertainties stochastic mechanisms governing responses electrochemical stimulation individual neurotransmitter release sites nerve junctions Models attempt capture scientific features sensitivity individual synaptic transmission sites electrochemical stimuli extent electrochemical responses stimulated This done via suitably structured classes prior distributions parameters describing features Such priors may structured permit assessment currently topical scientific hypotheses fundamental neural function Posterior analysis implemented via stochastic simulation Several data analyses described illustrate approach resulting neurophysiological insights recently generated experimental contexts Further developments open questions neurophysiological statistical noted Research partially supported NSF grants DMS DMS DMS This work represents part collaborative project Dr Dennis A Turner Duke University Medical Center Durham VA Data provided Dr Turner Dr Howard V Wheal Southampton University A slightly revised version paper published Journal American Statistical Association vol pp modified title Hierarchical Mixture Models Neurological Transmission Analysis The author recipient Mitchell Prize Bayesian analysis substantive concrete problem based work reported paper
The need software modules performing natural language processing NLP tasks growing These modules perform efficiently accurately time rapid development often mandatory Recent work indicated machine learning techniques general memorybased learning MBL particular offer tools meet ends We present examples modules trained MBL three NLP tasks texttospeech conversion ii partofspeech tagging iii phrase chunking We demonstrate three modules display high generalization accuracy argue MBL applicable similarly well large class NLP tasks
We present membership query ie interpolation algorithm exactly identifying class readonce formulas basis boolean threshold functions Using generic transformation Angluin Hellerstein Karpinski gives algorithm using membership equivalence queries exactly identifying class readonce formulas basis boolean threshold functions negation We also present series generic transformations used convert algorithm one learning model algorithm different model
We study time series model viewed decision tree Markov temporal structure The model intractable exact calculations thus utilize variational approximations We consider three different distributions approximation one Markov calculations performed exactly layers decision tree decoupled one decision tree calculations performed exactly time steps Markov chain decoupled one Viterbilike assumption made pick single likely state sequence We present simulation results artificial data Bach chorales Accepted oral presentation NIPS
Stochastic simulation algorithms likelihood weighting often give fast accurate approximations posterior probabilities probabilistic networks methods choice large networks Unfortunately special characteristics dynamic probabilistic networks DPNs used represent stochastic temporal processes mean standard simulation algorithms perform poorly In essence simulation trials diverge reality process observed time In paper present simulation algorithms use evidence observed time step push set trials back towards reality The first algorithm evidence reversal ER restructures time slice DPN evidence nodes slice become ancestors state variables The second algorithm called survival fittest sampling SOF repopulates set trials time step using stochastic reproduction rate weighted likelihood evidence according trial We compare performance algorithm likelihood weighting original network also investigate benefits combining ER SOF methods The ERSOF combination appears maintain bounded error independent number time steps simulation
Simulated Annealing Search technique single trial solution modified random An energy defined represents good solution The goal find best solution minimising energy Changes lead lower energy always accepted increase probabilistically accepted The probability given expEk B T Where E change energy k B constant T Temperature Initially temperature high corresponding liquid molten state large changes possible progressively reduced using cooling schedule allowing smaller changes system solidifies low energy solution
Systems learn examples often create disjunctive concept definition The disjuncts concept definition cover training examples referred small disjuncts The problem small disjuncts error prone large disjuncts may necessary achieve high level predictive accuracy Holte Acker Porter This paper extends previous work done problem small disjuncts investigating reasons small disjuncts error prone large disjuncts evaluating impact small disjuncts inductive learning This paper shows attribute noise missing attributes class noise training set size cause small disjuncts error prone large disjuncts This paper also evaluates impact factors learning small disjuncts ie error rate It shows two artificial domains low levels attribute noise applied training set ability learn correct noisefree concept evaluated small disjuncts primarily responsible making learning difficult
A number efficient learning algorithms achieve exact identification unknown function class using membership equivalence queries Using standard transformation algorithms easily converted online learning algorithms use membership queries Under transformation number equivalence queries made query algorithm directly corresponds number mistakes made online algorithm In paper consider several natural classes known learnable setting investigate minimum number equivalence queries accompanying counterexamples equivalently minimum number mistakes online model made learning algorithm makes polynomial number membership queries uses polynomial computation time We able reduce number equivalence queries used previous algorithms often prove matching lower bounds As example consider class DNF formulas n variables k Olog n terms Previously algorithm Blum Rudich BR provided best known upper bound Ok log n minimum number equivalence queries needed exact identification We greatly improve upper bound showing exactly k counterexamples needed learner knows k priori exactly k counterexamples needed learner know k priori This exactly matches known lower bounds BC For many results obtain complete characterization tradeoff number membership equivalence queries needed exact identification The classes consider monotone DNF formulas Horn sentences Olog nterm DNF formulas readk satj DNF formulas readonce formulas various bases deterministic finite automata
We present learning algorithm rulebased concept representations called rippledown rule sets Rippledown rule sets allow us deal exceptions rule separately introducing exception rules exception rules exception rule etc constant depth These local exception rules contrast decision lists exception rules must placed global ordering rules The localization exceptions makes possible represent concepts decision list representation On hand decision lists constant number alternations rules different classes represented constant depth rippledown rule sets polynomial increase size Our algorithm Occam algorithm constant depth rippledown rule sets hence PAC learning algorithm It based repeatedly applying greedy approximation method weighted set cover problem find good exception rule sets
We present algorithm learning sets rules organized k levels Each level contain arbitrary number rules c l l class associated level c concept given class basic concepts The rules higher levels precedence rules lower levels used represent exceptions As basic concepts use Boolean attributes infinite attribute space model certain concepts defined terms substrings Given sample examples algorithm runs polynomial time produces consistent concept representation size Olog k n k n size smallest consistent representation k levels rules This implies algorithm learns PAC model The algorithm repeatedly applies greedy heuristics weighted set cover The weights obtained approximate solutions previous set cover problems
In paper investigate representational methodological issues attractor network model mapping orthography semantics based Plaut We find contrary psycholinguistic studies response time concrete words represented bits output pattern slower abstract words This model also predicts response times words dense semantic neighborhood faster words semantically similar neighbors language This conceptually consistent neighborhood effect seen mapping orthography phonology Seidenberg McClelland Plaut et al patterns many neighbors faster pathways since regularity random mapping used clear cause effect different previous experiments We also report rather distressing finding Reaction time model measured time takes network settle presented new input When criterion used determine network settled changed include testing hidden units results reported change direction effect abstract words slower words dense semantic neighborhoods Since independent reasons exclude hidden units stopping criterion done common practice believe phenomenon interest mostly neural network practitioners However provide insight interaction hidden output units settling
Casebased reasoning CBR used form caching solved problems speedup later problem solving Using cached cases brings additional costs due retrieval time case adaptation time also storage space Simply storing cases result situation retrieving trying adapt old cases take time average caching This means caching must applied selectively build case memory actually useful This form utility problem The approach taken construct cost model system used predict effect changes system In paper describe utility problem associated caching cases construction cost model We present experimental results demonstrate model used predict effect certain changes case memory
A Genetic Algorithmic GA approach vector quantizer design combines conventional Generalized Lloyd Algorithm GLA presented We refer hybrid Genetic Generalized Lloyd Algorithm GGLA It works briefly follows A finite number codebooks called chromosomes selected Each codebook undergoes iterative cycles reproduction We perform experiments various alternative design choices using GaussianMarkov processes speech image source data signaltonoise ratio SNR performance measure In cases GGLA showed performance improvements respect GLA We also compare results ZadorGersho formula
In casebased planning CBP previously generated plans stored cases memory reused solve similar planning problems future CBP save considerable time planning scratch generative planning thus offering potential heuristic mechanism handling intractable problems One drawback CBP systems need highly structured memory requires significant domain engineering complex memory indexing schemes enable efficient case retrieval In contrast CBP system CaPER based massively parallel framebased AI language extremely fast retrieval complex cases large unindexed memory The ability fast frequent retrievals many advantages indexing unnecessary large casebases used memory probed numerous alternate ways allowing specific retrieval stored plans better fit target problem less adaptation fl Preliminary version article appearing IEEE Expert February pp This paper extended version
We present efficient method assigning number processors tasks associated cells rectangular uniform grid Load balancing equipartition constraints observed approximately minimizing total perimeter partition corresponds amount interprocessor communication This method based upon decomposition grid stripes optimal height We prove mild assumptions problem size grows large parameters error bound associated feasible solution approaches zero We also present computational results high level parallel Genetic Algorithm utilizes method make comparisons methods On network workstations algorithm solves within minutes instances problem would require one billion binary variables Quadratic Assignment formulation
Finding Bayesian balance exploration exploitation adaptive optimal control general intractable This paper shows compute suboptimal estimates based certainty equivalence approximation arising form dual control This systematizes extends existing uses exploration bonuses reinforcement learning Sutton The approach two components statistical model uncertainty world way turning exploratory behaviour
This paper deals nonlinear leastsquares problems involving fitting data parameterized analytic functions For generic regression data general result establishes countability stronger assumptions finiteness set functions giving rise critical points quadratic loss function In special case usually called singlehidden layer neural networks built upon standard sigmoidal activation tanhx equivalently e x rough upper bound cardinality provided well
This research funded part NSF Grant No IRI part ONR Grant No NJ We thank John Clement use protocol transcript James Greeno contribution developing constructive modeling interpretation Ryan Tweney helpful comments Todd W Griffith Nancy J Nersessian Ashok Goel Abstract We hypothesize generic models central conceptual change science This hypothesis origins two theoretical sources The first source constructive modeling derives philosophical theory synthesizes analyses historical conceptual changes science investigations reasoning representation cognitive psychology The theory constructive modeling posits generic mental models productive conceptual change The second source adaptive modeling derives computational theory creative design Both theories posit situation independent domain abstractions ie generic models Using constructive modeling interpretation reasoning exhibited protocols collected John Clement problem solving session involving conceptual change employ resources theory adaptive modeling develop new computational model ToRQUE Here describe piece analysis protocol illustrate synthesis two theories used develop system articulating testing ToRQUE The results research show generic modeling plays central role conceptual change They also demonstrate interdisciplinary synthesis provide significant insights scientific reasoning
This paper discusses design neural networks solve specific problems adaptive control In particular investigates influence typical problems arising realworld control tasks well techniques solution exist framework neurocontrol Based investigation systematic design method developed The method exemplified development adaptive force controller robot manipulator
We present informationtheoretic derivation learning algorithm clusters unlabelled data linear discriminants In contrast methods try preserve information input patterns maximize information gained observing output robust binary discriminators implemented sigmoid nodes We derive local weight adaptation rule via gradient ascent objective demonstrate dynamics simple data sets relate approach previous work suggest directions may extended
This paper presents ASOCS Adaptive SelfOrganizing Concurrent System model massively parallel processing incrementally defined rule systems areas adaptive logic robotics logical inference dynamic control An ASOCS adaptive network composed many simple computing elements operating asynchronously parallel This paper focuses Adaptive Algorithm AA details architecture learning algorithm AA significant memory knowledge maintenance advantages previous ASOCS models An ASOCS operate either data processing mode learning mode During learning mode ASOCS given new rule expressed boolean conjunction The AA learning algorithm incorporates new rule distributed fashion short bounded time During data processing mode ASOCS acts parallel hardware circuit
This paper presents method analyzing coupled time series using Markov models domain state space immense To make parameter estimation tractable large state space represented Cartesian product smaller state spaces paradigm known factorial Markov models The transition matrix model represented mixture transition matrices underlying dynamical processes This formulation know mixed memory Markov models Using framework analyze daily exchange rates five currencies British pound Canadian dollar Deutsch mark Japanese yen Swiss franc measured US dollar
This work explores use machine learning methods extracting knowledge simulations complex systems In particular use genetic algorithms learn rulebased strategies used autonomous robots The evaluation given strategy may require several executions simulation produce meaningful estimate quality strategy As consequence evaluation single individual genetic algorithm requires fairly substantial amount computation Such system suggests sort largegrained parallelism available network workstations We describe implementation parallel genetic algorithm present case studies resulting speedup two robot learning tasks
Most Artificial Neural Networks ANNs fixed topology learning often suffer number shortcomings result Variations ANNs use dynamic topologies shown ability overcome many problems This paper introduces LocationIndependent Transformations LITs general strategy implementing distributed feedforward networks use dynamic topologies dynamic ANNs efficiently parallel hardware A LIT creates set locationindependent nodes node computes part network output independent nodes using local information This type transformation allows efficient support adding deleting nodes dynamically learning In particular paper presents LIT dynamic Backpropagation networks single hidden layer The complexity learning execution algorithms Onplogm single pattern nis number inputs p number outputs number hidden nodes original network Keywords Neural Networks Backpropagation Implementation Design Dynamic Topologies Reconfigurable Architectures
Hard combinatorial problems sequencing scheduling led recently research genetic algorithms Canonical coding symmetric TSP modified coding njob mmachine flowshop problem configurates solution space different way We show well known genetic operators act intelligently coding scheme They implecitely prefer subset solutions contain probably best solutions respect objective We conjecture every new problem needs determination necessary condition genetic algorithm work e proof experiment We implemented asynchronous parallel genetic algorithm UNIXbased computer network Computational results new heuristic discussed
This paper presents VLSI implementation Priority Adaptive SelfOrganizing Concurrent System PASOCS learning model built using multichip module MCM substrate Many current hardware implementations neural network learning models direct implementations classical neural network structuresa large number simple computing nodes connected dense number weighted links PASOCS one class ASOCS Adaptive SelfOrganizing Concurrent System connectionist models whose overall goal classical neural networks models whose functional mechanisms differ significantly This model potential application areas pattern recognition robotics logical inference dynamic control
The application adaptive optimization strategies scheduling manufacturing systems recently become research topic broad interest Population based approaches scheduling predominantly treat static data models whereas realworld scheduling tends dynamic problem This paper briefly outlines application genetic algorithm dynamic job shop problem arising production scheduling First sketch genetic algorithm handle release times jobs In second step preceding simulation method used improve performance algorithm Finally job shop regarded nondeterministic optimization problem arising occurrence job releases Temporal Decomposition leads scheduling control interweaves simulation time genetic search
Neural network pruning methods level individual network parameters eg connection weights improve generalization shown empirical study However open problem pruning methods known today OBD OBS autoprune epsiprune selection number parameters removed pruning step pruning strength This work presents pruning method lprune automatically adapts pruning strength evolution weights loss generalization training The method requires algorithm parameter adjustment user Results statistical significance tests comparing autoprune lprune static networks early stopping given based extensive experimentation different problems The results indicate training pruning often significantly better rarely significantly worse training early stopping without pruning Furthermore lprune often superior autoprune superior OBD diagnosis tasks unless severe pruning early training process required
Casebased problemsolving systems rely similarity assessment select stored cases whose solutions easily adaptable fit current problems However widelyused similarity assessment strategies evaluation semantic similarity poor predictors adaptability As result systems may select cases difficult impossible adapt even easily adaptable cases available memory This paper presents new similarity assessment approach couples similarity judgments directly case library containing systems adaptation knowledge It examines approach context casebased planning system learns new plans new adaptations Empirical tests alternative similarity assessment strategies show approach enables better case selection increases benefits accrued learned adaptations
The casebased reasoning process depends multiple overlapping knowledge sources provides opportunity learning Exploiting opportunities requires determining learning mechanisms use individual knowledge source also different learning mechanisms interact combined utility This paper presents case study examining relative contributions costs involved learning processes three different knowledge sourcescases case adaptation knowledge similarity informationin casebased planner It demonstrates importance interactions different learning processes identifies promising method integrating multiple learning methods improve casebased reasoning
Casebased reasoning depends multiple knowledge sources beyond case library including knowledge case adaptation criteria similarity assessment Because hand coding knowledge accounts large part knowledge acquisition burden developing CBR systems appealing acquire learning CBR promising learning method apply This observation suggests developing casebased CBR systems CBR systems whose components use CBR However despite early interest casebased approaches CBR method received comparatively little attention Open questions include casebased components CBR system designed amount knowledge acquisition effort require effectiveness This paper investigates questions case study issues addressed methods used results achieved casebased planning system uses CBR guide case adaptation similarity assessment The paper discusses design considerations presents empirical results support usefulness casebased CBR point potential problems tradeoffs directly demonstrate overlapping roles different CBR knowledge sources The paper closes general lessons casebased CBR areas future research
A linear support vector machine formulation used generate fast finitelyterminating linearprogramming algorithm discriminating two massive sets ndimensional space number points orders magnitude larger n The algorithm creates succession sufficiently small linear programs separate chunks data time The key idea small number support vectors corresponding linear programming constraints positive dual variables carried successive small linear programs containing chunk data We prove procedure monotonic terminates finite number steps exact solution leads globally optimal separating plane entire dataset Numerical results fully dense publicly available datasets numbering million points dimensional space confirm theoretical results demonstrate ability handle large problems
In Sejnowski Rosenberg developed famous NETtalk system English texttospeech This chapter describes machine learning approach texttospeech builds upon extends initial NETtalk work Among many extensions NETtalk system following different learning algorithm wider input window errorcorrecting output coding righttoleft scan word pronounced results decision influencing subsequent decisions addition several useful input features These changes yielded system performs much better original NETtalk system After training words system achieves correct pronunciation individual phonemes correct pronunciation whole words pronunciation must exactly match dictionary pronunciation correct Based judgements three human participants blind assessment study system estimated serious error rate whole words compared error rate DECTalk rulebase
The problem minimizing number misclassified points plane attempting separate two point sets intersecting convex hulls ndimensional real space formulated linear program equilibrium constraints LPEC This general LPEC converted exact penalty problem quadratic objective linear constraints A FrankWolfetype algorithm proposed penalty problem terminates stationary point global solution Novel aspects approach include A linear complementarity formulation step function counts misclassifications ii Exact penalty formulation without boundedness nondegeneracy constraint qualification assumptions iii An exact solution extraction sequence minimizers penalty function finite value penalty parameter general LPEC explicitly exact solution LPEC uncoupled constraints iv A parametric quadratic programming formulation LPEC associated misclassification minimization problem
Planning analogical reasoning learning method consists storage retrieval replay planning episodes Planning performance improves accumulation reuse library planning cases Retrieval driven domaindependent similarity metrics based planning goals scenarios In complex situations multiple goals retrieval may find multiple past planning cases jointly similar new planning situation This paper presents issues implications involved replay multiple planning cases opposed single one Multiple case plan replay involves adaptation merging annotated derivations planning cases Several merge strategies replay introduced process various forms eagerness differences past new situations annotated justifications planning cases In particular introduce effective merging strategy considers plan step choices especially appropriate interleaving planning plan execution We illustrate discuss effectiveness merging strategies specific domains
Mixedinitiative planning envisions framework automated human planners interact jointly construct plans satisfy specific objectives In paper report work engineering robust mixedinitiative planning system Human planners rely strongly past planning experience generate new plans ForMAT casebased system supports human planning accumulation userbuilt plans querydriven browsing past plans several plan functionality analysis primitives ProdigyAnalogy automated AI planner combines generative casebased planning Stored plans annotated plan rationale reuse involves adaptation driven rationale Our system MICBP integrates ForMAT ProdigyAnalogy realtime messagepassing mixedinitiative planning system The main technical approach consists allowing user specify link objectives enable system capture reuse plan rationale We present MICBP concrete application domain military force deployment planning This synergistic system increases planning efficiency human planners automated suggestion similar past plans plausible plan modifications
The primary goal inductive learning generalize well induce function accurately produces correct output future inputs Hansen Salamon showed certain assumptions combining predictions several separately trained neural networks improve generalization One key assumptions individual networks independent errors produce In standard way performing backpropagation assumption may violated standard procedure initialize network weights region weight space near origin This means backpropagations gradientdescent search may reach small subset possible local minima In paper present approach initializing neural networks uses competitive learning intelligently create networks originally located far origin weight space thereby potentially increasing set reachable local minima We report experiments two realworld datasets combinations networks initialized method generalize better combina tions networks initialized traditional way
We present two algorithms inducing structural equation models data Assuming latent variables models causal interpretation parameters may estimated linear multiple regression Our algorithms comparable PC IC rely conditional independence We present algorithms empirical comparisons PC IC
We investigate neural network based approximation methods These methods depend locality basis functions After discussing local global basis functions propose multiresolution hierarchical method The various resolutions stored various levels tree At root tree global approximation kept leafs store learning samples Intermediate nodes store intermediate representations In order find optimal partitioning input space selforganising maps SOMs used The proposed method implementational problems reminiscent encountered manyparticle simulations We investigate parallel implementation method using parallel hierarchical meth ods manyparticle simulations starting point
Orthogonal incremental learning OIL new approach incremental training feedforward network single hidden layer OIL based idea describe output weights hidden nodes set orthogonal basis functions Hidden nodes treated orthogonal representation network output weights domain We proved separate training hidden nodes conflict previously optimized nodes described special relationship orthogonal backpropagation OBP rule An advantage OIL existing algorithms extremely fast learning This approach also easily extended buildup incrementally arbitrary function linear composition adjustable functions necessarily orthogonal OIL tested twospirals Net Talk benchmark problems
Todays potential users machine learning technology faced nontrivial problem choosing large everincreasing number available tools one appropriate particular task To assist often noninitiated users desirable model selection process automated Using experience base level learning researchers proposed metalearning possible solution Historically predictive accuracy de facto criterion work metalearning focusing discovery rules match applications models based accuracy Although predictive accuracy clearly important criterion also case number criteria could often ought considered learning model selection This paper presents number criteria discusses impact metalevel approaches model selection
One approach invariant object recognition employs recurrent neural network associative memory In standard depiction networks state space memories objects stored attractive fixed points dynamics I argue modification picture object continuous family instantiations represented continuous attractor This idea illustrated network learns complete patterns To perform task filling missing information network develops continuous attractor models manifold patterns drawn From statistical viewpoint pattern completion task allows formulation unsupervised A classic approach invariant object recognition use recurrent neural network associative memory In spite intuitive appeal biological plausibility approach largely abandoned practical applications This paper introduces two new concepts could help resurrect object representation continuous attractors learning attractors pattern completion In models associative memory memories stored attractive fixed points discrete locations state space Discrete attractors may appropriate patterns continuous variability like images threedimensional object different viewpoints When instantiations object lie continuous pattern manifold appropriate represent objects attractive manifolds fixed points continuous attractors To make idea practical important find methods learning attractors examples A naive method train network retain examples shortterm memory This method deficient prevent network storing spurious fixed points unrelated examples A superior method train network restore examples corrupted learns complete patterns filling missing information learning terms regression rather density estimation
In paper consider problem independent constraint handling mechanism Stepwise Adaptation Weights SAW show working graph coloring problems SAWing technically belongs penalty function based approaches amounts modifying penalty function search We show twofold benefit First proves rather insensitive technical parameters thereby providing general problem independent way handle constrained problems Second leads superior EA performance In extensive series comparative experiments show SAWing EA outperforms powerful graph coloring heuristic algorithm DSatur hardest graph instances linear scaleup behaviour
Recently several neural algorithms introduced Independent Component Analysis Here approach problem point view single neuron First simple Hebbianlike learning rules introduced estimating one independent components sphered data Some learning rules used estimate independent component negative kurtosis others estimate component positive kurtosis Next twounit system introduced estimate independent component kurtosis The results generalized estimate independent components nonsphered raw mixtures To separate several independent components system several neurons linear negative feedback used The convergence learning rules rigorously proven without unnecessary hypotheses distributions independent components
In paper define task place learning describe one approach problem The framework represents distinct places using evidence grids probabilistic description occupancy Place recognition relies casebased classification augmented registration process correct translations The learning mechanism also similar casebased systems involving simple storage inferred evidence grids Experimental studies physical simulated robots suggest approach improves place recognition experience handle significant sensor noise scales well increasing numbers places Previous researchers studied evidence grids place learning combined two powerful concepts used experimental methods machine learning evaluate methods abilities
In constructive induction CI learners problem representation modified normal part learning process This useful initial representation inadequate inappropriate In paper I argue distinction constructive nonconstructive methods unclear I propose theoretical model allows clean distinction made b process CI properly motivated I also show although constructive induction used almost exclusively context supervised learning reason form part unsupervised regime
When designing deductive database designer decide predicate relation whether defined extensionally intensionally definition look like An intelligent system presented assist designer task It starts example database predicates defined extensionally It tries compact database transforming extensionally defined predicates intensionally defined ones The intelligent system employs techniques area inductive logic programming
When work information multiple sources formalism employs handle uncertainty may uniform In order able combine knowledge bases different formats need first establish common basis characterizing evaluating different formalisms provide semantics combined mechanism A common framework provide infrastructure building integrated system essential understand behavior We present unifying framework based ordered partition possible worlds called partition sequences corresponds intuitive notion biasing towards certain possible scenarios uncertain actual situation We show existing formalisms namely default logic autoepistemic logic probabilistic conditioning thresholding generalized conditioning possibility theory incorporated general framework
This paper investigates technique creating sparsely connected feedforward neural networks may capable producing networks large input output layers The architecture appears particularly suited tasks involve sparse training data able take advantage sparseness reduce training time Some initial results presented based tests bit compression problem
In paper propose method calculate posterior probability nondecomposable graphical Gaussian model Our proposal based new device sample Wishart distributions conditional graphical constraints As result methodology allows Bayesian model selection within whole class graphical Gaussian models including nondecomposable ones
For absorbing Markov chain reinforcement transition Bertsekas gives simple example function learned TD depends Bertsekas showed approximation optimal respect leastsquares error value function approximation obtained TD method poor respect metric With respect error values TD approximates function better TD However respect error differences values TD approximates function better TD TD better TD respect former metric rather latter In addition direct TD weights errors unequally residual gradient methods Baird Harmon Baird Klopf weight errors equally For case control simple Markov decision process presented direct TD residual gradient TD learn optimal policy TD learns suboptimal policy These results suggest example differences state values significant state values TD preferable TD
Lazy learning methods provide useful representations training algorithms learning complex phenomena autonomous adaptive control complex systems This paper surveys ways locally weighted learning type lazy learning applied us control tasks We explain various forms control tasks take affects choice learning paradigm The discussion section explores interesting impact explicitly remembering previous experiences problem learning control
Genetic programming GP variant genetic algorithms data structures handled trees This makes GP especially useful evolving functional relationships computer programs represented trees Symbolic regression determination function dependence gx approximates set data points x In paper feasibility symbolic regression GP demonstrated two examples taken different domains Furthermore several suggested methods literature compared intended improve GP performance readability solutions taking account introns redundancy occurs trees keeping size trees small The experiments show GP elegant useful tool derive complex functional dependencies numerical data
We report study mixture modeling problems arising assessment chemical structureactivity relationships drug design discovery Pharmaceutical research laboratories developing test compounds screening synthesize many related candidate compounds linking together collections basic molecular building blocks known monomers These compounds tested biological activity feeding screening analysis drug design The tests also provide data relating compound activity chemical properties aspects structure associated monomers focus studying relationships aid future monomer selection The level chemical activity compounds based geometry chemical binding test compounds target binding sites receptor compounds screening tests unable identify binding configurations Hence potentially critical covariate information missing natural latent variable Resulting statistical models mixed respect missing information complicating data analysis inference This paper reports study twomonomer twobinding site framework associated data We build structured mixture models mix linear regression models predicting chemical effectiveness respect sitebinding selection mechanisms We discuss aspects modeling analysis including problems pitfalls describe results analyses simulated real data set In modeling real data led critical model extensions introduce hierarchical random effects components adequately capture heterogeneities site binding mechanisms resulting levels effectiveness compounds bound Comments current potential future directions conclude report
We give analysis generalization error cross validation terms two natural measures difficulty problem consideration approximation rate accuracy target function ideally approximated function number hypothesis parameters estimation rate deviation training generalization errors function number hypothesis parameters The approximation rate captures complexity target function respect hypothesis model estimation rate captures extent hypothesis model suffers overfitting Using two measures give rigorous general bound error cross validation The bound clearly shows tradeoffs involved making fl fraction data saved testing large small By optimizing bound respect fl argue combination formal analysis plotting controlled experimentation following qualitative properties cross validation behavior quite robust significant changes underlying model selection problem
Model selection eg considered problem choosing hypothesis language provides optimal balance low empirical error high structural complexity In Abstract discuss intuition new efficient approach model selection Our approach inherently Bayesian eg instead using priors target functions hypotheses talk priors error values leads us new mathematical characterization expected true error In setting classification learning learner given sample drawn according unknown distribution labeled instances returns empirical minimizer hypothesis least empirical error certain unknown true error If process carried repeatedly true error empirical minimizer vary run run empirical minimizer depends randomly drawn sample This induces distribution true errors empirical minimizers possible samples drawn according unknown distribution If distribution would known one could easily derive expected true error empirical minimizer model integrating distribution This would immediately lead optimal model selection algorithm Enumerate models calculate expected error model integrating error distribution select model least expected error PAC theory VC framework provide worstcase bounds chance drawing sample true error minimizer exceeds worstcase meaning hold distribution instances concept given class By contrast focus determine distribution fixed given learning problem specified assumptions Unlike worstcase bound depends size VCdimension hypothesis space actual error distribution depends hypothesis space unknown distribution labeled instances However prove certain assumption independence hypotheses distribution true errors hence expected true error expressed function distribution empirical errors uniformly drawn hypotheses thought prior error values The latter distribution always onedimensional estimated fixedsized initial portion training data fixedsized set randomly drawn hypotheses This estimate distribution leads us estimate expected true error empirical minimizer model turn leads highly efficient model selection algorithm We study behavior approach several controlled experiments Our results show accuracy error estimate least comparable accuracy estimate obtained fold crossvalidation provided prior error values estimated using least examples But CV requires ten invocations learner per model time algorithm requires assess model constant size model We also study robustness algorithm violations independence assumptions We observe bias predictions hypotheses space size four less When hypothesis space size dependencies diluted violations assumptions negligible incur significant error The full paper available httpkicstuberlindeschefferpaperseedreportps
In area inductive learning generalization main operation usual definition induction based logical implication Recently rising interest clausal representation knowledge machine learning Almost inductive learning systems perform generalization clauses use relation subsumption instead implication The main reason wellknown simple technique compute least general generalizations subsumption implication However generalization subsumption inappropriate learning recursive clauses crucial problem since recursion basic program structure logic programs We note implication clauses undecidable therefore introduce stronger form implication called Timplication decidable clauses We show every finite set clauses exists least general generalization Timplication We describe technique reduce generalizations implication clause generalizations subsumption call expansion original clause Moreover show every nontautological clause exists Tcomplete expansion means every generalization Timplication clause reduced generalization subsumption expansion
This paper argues Bayesian probability theory general method machine learning From two wellfounded axioms theory capable accomplishing learning tasks incremental nonincremental supervised unsupervised It learn different types data regardless whether noisy perfect independent facts behaviors unknown machine These capabilities partially demonstrated paper uniform application theory two typical types machine learning incremental concept learning unsupervised data classification The generality theory suggests process learning may many different types currently held method oldest may best
This paper focuses bias variance decomposition analysis local learning algorithm nearest neighbor classifier extended error correcting output codes This extended algorithm often considerably reduces ie classification error comparison nearest neighbor Ricci Aha The analysis presented reveals performance improvement obtained drastically reducing bias cost increasing variance We also show even classification problems classes extending codeword length beyond limit assures column separation yields error reduction This error reduction variance due voting mechanism used errorcorrecting output codes also bias
We integrated distributed search genetic programming GP based systems collective memory form collective adaptation search method Such system significantly improves search problem complexity increased Since pure GP approach scale well problem complexity natural question two components actually contributing search process We investigate collective memory search utilizes random search engine find significantly outperforms GP based search engine We examine solution space show problem complexity search space grow collective adaptive system perform better collective memory search employing random search engine
The document presents approach judging relevance retrieved information based novel approach similarity assessment Contrary systems define relevance measures context similarity query time This necessary since without context similarity one guarantee similar items also relevant
This paper presents selfimproving reactive control system autonomous robotic navigation The navigation module uses schemabased reactive control system perform navigation task The learning module combines casebased reasoning reinforcement learning continuously tune navigation system experience The casebased reasoning component perceives characterizes systems environment retrieves appropriate case uses recommendations case tune parameters reactive control system The reinforcement learning component refines content cases based current experience Together learning components perform online adaptation resulting improved performance reactive control system tunes environment well online learning resulting improved library cases capture environmental regularities necessary perform online adaptation The system extensively evaluated simulation studies using several performance metrics system configurations
A model onsite learning presented The system learns querying hard patterns classifying easy ones This model related querybased filtering methods takes account addition labelling filtering data cost A simple policies introduced analyzed simple problem D high low game In addition QuerybyCommittee algorithm Seung et al suggested good approximator model space realworld domains Results using algorithm synthesized problem realworld OCR task using backpropagation network nearest neighbor classifier show onsite learner perform well classifier trained offsite achieving significant cost reduction
The standard method obtaining response treebased genetic programming take value returned root node In nontree representations alternate methods explored One alternative treat specific location indexed memory response value program terminates The purpose paper explore applicability technique treestructured programs explore intron effects studies bring light This papers experimental results support finding memorybased program response technique improvement problems In addition papers experimental results support finding contrary past research speculation addition even facilitation introns seriously degrade search performance genetic programming
We discuss implications Holtes recentlypublished article demonstrated commonly used data simple classification rules almost accurate decision trees produced Quinlans C We consider particular significance Holtes results future topdown induction decision trees To extent Holte questioned sense research multilevel decision tree learning We go detail parts Holtes study We try put results perspective We argue absolute terms small difference accuracy R C witnessed Holte still significant We claim C possesses additional accuracyrelated advantages R In addition discuss representativeness databases used Holte We compare empirically optimal accuracies multilevel onelevel decision trees observe significant differences We point several deficien cies limitedcomplexity classifiers
We describe approach graphemetophoneme conversion languageindependent dataoriented Given set examples spelling words associated phonetic representation language graphemetophoneme conversion system automatically produced language takes input spelling words produces output phonetic transcription according rules implicit training data We describe design system compare performance knowledgebased alternative dataoriented approaches
No finite sample sufficient determine density therefore entropy signal directly Some assumption either functional form density smoothness necessary Both amount prior space possible density functions By far common approach assume density parametric form By contrast derive differential learning rule called EMMA optimizes entropy way kernel density estimation Entropy derivative calculated sampling density estimate The resulting parameter update rule surprisingly simple efficient We show EMMA used detect correct corruption magnetic resonance images MRI This application beyond scope existing parametric entropy models
A satisficing search problem consists set probabilistic experiments performed order without repetitions satisfying configuration successes failures reached The cost performing experiments depends order chosen Earlier work concentrated finding optimal search strategies special cases model search trees andor graphs cost function success probabilities experiments given In contrast study complexity learning approximately optimal search strategy success probabilities known outset Working fully general model show n number unknown probabilities C maximum cost performing experiments
We present method calculating phase diagrams highdimensional variant SelfOrganizing Map SOM The method requires ansatz tesselation data space induced map explicit state map Using method analyze two recently proposed models development orientation ocular dominance column maps The phase transition condition orientation map turns different form corresponding lowdimensional map
We study process multiagent reinforcement learning context load balancing distributed system without use either central coordination explicit communication We first define precise framework study adaptive load balancing important features stochastic nature purely local information available individual agents Given framework show illuminating results interplay basic adaptive behavior parameters effect system efficiency We investigate properties adaptive load balancing heterogeneous populations address issue exploration vs exploitation context Finally show naive use communication may improve might even harm system efficiency
Brendan J Frey Geoffrey E Hinton Efficient stochastic source coding application Bayesian network source model The Computer Journal In paper introduce new algorithm called bitsback coding makes stochastic source codes efficient For given onetomany source code show algorithm actually efficient algorithm always picks shortest codeword Optimal efficiency achieved codewords chosen according Boltzmann distribution based codeword lengths It turns commonly used technique determining parameters maximum likelihood estimation actually minimizes bitsback coding cost codewords chosen according Boltzmann distribution A tractable approximation maximum likelihood estimation generalized expectation maximization algorithm minimizes bitsback coding cost After presenting binary Bayesian network model assigns exponentially many codewords symbol show tractable approximation Boltzmann distribution used bitsback coding We illustrate performance bitsback coding using using nonsynthetic data binary Bayesian network source model produces possible codewords input symbol The rate bitsback coding nearly one half obtained picking shortest codeword symbol
Agents learn agents exploit information possess distinct advantage competitive situations Games provide stylized adversarial environments study agent learning strategies Researchers developed game playing programs learn play better experience We developed learning program learn play better learns identify exploit weaknesses particular opponent repeatedly playing several games We propose scheme learning opponent action probabilities utility maximization framework exploits learned opponent model We show proposed expected utility maximization strategy generalizes traditional maximin strategy allows players benefit taking calculated risks avoided maximin strategy Experiments popular board game Connect show learning player consistently outperforms nonlearning player pitted another automated player using weaker heuristic Though proposed mechanism improve skill level computer player improve ability play effectively weaker opponent
Many real world learning problems best characterized interaction multiple independent causes factors Discovering causal structure data focus paper Based Zemel Hintons cooperative vector quantizer CVQ architecture unsupervised learning algorithm derived ExpectationMaximization EM framework Due combinatorial nature data generation process exact Estep computationally intractable Two alternative methods computing Estep proposed Gibbs sampling meanfield approximation promising empirical results presented
This paper deals problem blind identification source separation consists estimation mixing matrix andor separation mixture stochastically independent sources without priori knowledge mixing matrix The method propose estimates mixture matrix recurrent InputOutput IO Identification using inputs nonlinear transformation estimated sources Herein nonlinear transformation distortion consists constraining modulus inputs IOIdentification device constant In contrast existing approaches covariance additive noise need modeled estimated regular parameter needed The proposed approach implemented using multilayer neural networks order improve performance separation New associated online unsupervised adaptive learning rules also developed The effectiveness proposed method illustrated computer simulations
Source separation consists recovering set n independent signals n observed instantaneous mixtures signals possibly corrupted additive noise Many source separation algorithms use second order information whitening operation reduces non trivial part separation determining unitary matrix Most show kind invariance property exploited predict general results performance Our first contribution exhibit lower bound performance terms accuracy separation This bound independent algorithm iid case distribution source signals Second show performance invariant algorithms depends mixing matrix noise level specific way A consequence low noise levels performance depend mixture distribution sources via function characteristic given source separation algorithm
In paper neural network approach reconstruction natural highly correlated images linear additive mixture proposed A multilayer architecture local online learning rules developed solve problem blind separation sources The main motivation using multilayer network instead singlelayer one improve performance robustness separation applying simple local learning rule biologically plausible Moreover architecture onchip learning relatively easy implementable using VLSI electronic circuits Furthermore enables extraction source signals sequentially one starting strongest signal finishing weakest one The experimental part focuses separating highly correlated human faces mixture additive noise unknown number sources
We study online learning algorithms predict combining predictions several subordinate prediction algorithms sometimes called experts These simple algorithms belong multiplicative weights family algorithms The performance algorithms degrades logarithmically number experts making particularly useful applications number experts large However applications text categorization often natural experts abstain making predictions instances We show transform algorithms assume experts always awake algorithms require assumption We also show derive corresponding loss bounds Our method general applied large family online learning algorithms We also give applications various prediction models including decision graphs switching experts
When dealing classification problems current ILP systems often lag behind stateoftheart attributional learners Part blame ascribed much larger hypothesis space therefore thoroughly explored However sometimes due fact ILP systems take account probabilistic aspects hypotheses classifying unseen examples This paper proposes We developed naive Bayesian classifier within ILPR first order learner The learner uses clever RELIEF based heuristic able detect strong dependencies within literal space dependencies exist We conducted series experiments artificial realworld data sets The results show combination ILPR together naive Bayesian classifier sometimes significantly improves classification unseen instances measured classification accuracy average information score
Process simulation emerged valuable tool process design analysis operation In work extend capabilities iterated linear programming LP dealing problems encountered dynamic nonsmooth process simulation A previously developed LP method refined addition new descent strategy combines line search trust region approach This adds stability efficiency method The LP method advantage naturally dealing profile bounds well This demonstrated avoid computational difficulties arise iterates going physically unrealistic regions A new method treatment discontinuities occurring dynamic simulation problems also presented paper The method ensures event occurred within time interval consideration detected one event occurs detected one indeed earliest one A specific class implicitly discontinuous process simulation problems phase equilibrium calculations also looked A new formulation introduced solve multiphase problems fl To correspondence addressed emailbieglercmuedu
A frequently observed difficulty application genetic algorithms domain optimization arises premature convergence In order preserve genotype diversity develop new model autoadaptive behavior individuals In model population member active individual assumes sociallike behavior patterns Different individuals living population assume different patterns By moving hierarchy social states individuals change behavior Changes social state controlled arguments plausibility These arguments implemented rule set massivelyparallel genetic algorithm Computational experiments largescale job shop benchmark problems show results new approach dominate ordinary genetic algorithm significantly
Proben collection problems neural network learning realm pattern classification function approximation plus set rules conventions carrying benchmark tests similar problems Proben contains data sets different domains All datasets represent realistic problems could called diagnosis tasks one consist real world data The datasets presented simple format using attribute representation directly used neural network training Along datasets Proben defines set rules conduct document neural network benchmarking The purpose problem rule collection give researchers easy access data evaluation algorithms networks make direct comparison published results feasible This report describes datasets benchmarking rules It also gives basic performance measures indicating difficulty various problems These measures used baselines comparison
This paper presents NeuroChess program learns play chess final outcome games NeuroChess learns chess board evaluation functions represented artificial neural networks It integrates inductive neural network learning temporal differencing variant explanationbased learning Performance results illustrate strengths weaknesses approach
Some important factors play major role determining performances CBR CaseBased Reasoning system complexity accuracy retrieval phase Both flat memory inductive approaches suffer serious drawbacks In first approach search time increases dealing large scale memory base second one modification case memory becomes complex sophisticated architecture In paper show construct simple efficient indexing system structure The idea construct case hierarchy two levels memory lower level contains cases organised groups similar cases upper level contains prototypes prototype represents one group cases This smaller memory used retrieval phase Prototype construction achieved means incremental prototypebased NN Neural Network We show mode CBRNN coupling preprocessing one neural network serves indexing system
Developing ability recognize landmark visual image robots current location fundamental problem robotics We consider problem PAClearning concept class geometric patterns target geometric pattern configuration k points real line Each instance configuration n points real line labeled according whether visually resembles target pattern To capture notion visual resemblance use Hausdorff metric Informally two geometric patterns P Q resemble Hausdorff metric every point one pattern close point pattern We relate concept class geometric patterns landmark recognition problem present polynomialtime algorithm PAClearns class onedimensional geometric patterns We also present experimental results algorithm performs
The concept measure functions generalization performance suggested This concept provides alternative way selecting evaluating learned models classifiers In addition makes possible state learning problem computational problem The known prior metaknowledge problem domain captured measure function possible combination training set classifier assigns value describing good classifier The computational problem find classifier maximizing measure function We argue measure functions great value practical applications Besides tool model selection force us make explicit relevant prior knowledge learning problem hand ii provide deeper understanding existing algorithms iii help us construction problemspecific algorithms We illustrate last point suggesting novel algorithm based incremental search classifier optimizes given measure function
Recurrent attractor networks offer many advantages feedforward networks modeling psychological phenomena Their dynamic nature allows capture time course cognitive processing learned weights may often easily interpreted soft constraints representational components Perhaps significant feature networks however ability facilitate generalization enforcing well formedness constraints intermediate output representations Attractor networks learn systematic regularities well formed representations exposure small number examples said possess articulated attractors This paper investigates conditions articulated attractors arise recurrent networks trained using variants backpropagation The results computational experiments demonstrate structured attractors spontaneously appear emergence systematicity appropriate error signal presented directly recurrent processing elements We show however distal error signals backpropagated intervening weights pose serious problems networks kind We present simulation results discuss reasons difficulty suggest directions future attempts surmount
Induced decision trees extensivelyresearched solution classification tasks For many practical tasks trees produced treegeneration algorithms comprehensible users due size complexity Although many tree induction algorithms shown produce simpler comprehensible trees data structures derived trees good classification accuracy tree simplification usually secondary concern relative accuracy attempt made survey literature perspective simplification We present framework organizes approaches tree simplification summarize critique approaches within framework The purpose survey provide researchers practitioners concise overview treesimplification approaches insight relative capabilities In final discussion briefly describe empirical findings discuss application tree induction algorithms case retrieval casebased reasoning systems
In paper propose monitor Markov chain sampler using cusum path plot chosen dimensional summary statistic We argue cusum path plot bring effectively sequential plot aspects Markov sampler tell user quickly slowly sampler moving around sample space direction summary statistic The proposal illustrated four examples represent situations cusum path plot works well well Moreover rigorous analysis given one examples We conclude cusum path plot effective tool convergence diagnostics Markov sampler comparing different Markov samplers
This paper gives precise easy compute bounds convergence time Gibbs sampler used Bayesian image reconstruction For sampling Gibbs distribution without presence external field bounds N number pixels obtained proportionality constant easy calculate Some key words Bayesian image restoration Convergence Gibbs sampler Ising model Markov chain Monte Carlo
c flMIT Media Lab Perceptual Computing Learning Common Sense Technical Report nov revised jun Abstract We present methods coupling hidden Markov models hmms model systems multiple interacting processes The resulting models multiple state variables temporally coupled via matrices conditional probabilities We introduce deterministic OT CN approximation maximum posterior MAP state estimation enables fast classification parameter estimation via expectation maximization An Nheads dynamic programming algorithm samples highest probability paths compact state trellis minimizing upper bound cross entropy full combinatoric dynamic programming problem The complexity OT CN C chains N states apiece observing T data points compared OT N C naive Cartesian product exact state clustering stochastic Monte Carlo methods applied inference problem In several experiments examining training time model likelihoods classification accuracy robustness initial conditions coupled hmms compared favorably conventional hmms energybased approaches coupled inference chains We demonstrate compare algorithms synthetic real data including interpretation video
SUMMARY The paper describes Bayesian analysis agricultural field experiments topic received little previous attention despite vast frequentist literature Adoption Bayesian paradigm simplifies interpretation results especially ranking selection Also complex formulations analyzed comparative ease using Markov chain Monte Carlo methods A key ingredient approach need spatial representations unobserved fertility patterns This discussed detail Problems caused outliers jumps fertility tackled via hierarchicalt formulations may find use contexts The paper includes three analyses variety trials yield one example involving binary data none entirely straightforward Some comparisons frequentist analyses made The datasets available httpwwwstatdukeeduhigdontrialsdatahtml
We show paper continuous state space Markov chains rigorously discretized finite Markov chains The idea subsample continuous chain renewal times related small sets control discretization Once finite Markov chain derived MCMC output general convergence properties finite state spaces exploited convergence assessment several directions Our choice based divergence criterion derived Kemeny Snell first evaluated parallel chains stopping time implemented efficiently two parallel chains using Birkhoffs pointwise ergodic theorem stopping rules The performance criterion illustrated three standard examples
Markov chain Monte Carlo MCMC samplers proved remarkably popular tools Bayesian computation However problems arise application density interest high dimensional strongly correlated In circumstances sampler may slow traverse state space mixing poor In article offer partial solution problem The state space Markov chain augmented accommodate multiple chains parallel Updates individual chains based around genetic style crossover operator acting parent states drawn population chains This process makes efficient use gradient information implicitly encoded within distribution states across population Empirical studies support claim crossover operator acting parallel population chains improves mixing This illustrated example sampling high dimensional posterior probability density complex predictive model By adopting latent variable approach methodology extended deal variable selection model averaging high dimensions This illustrated example knot selection spline interpolant
MIT Computational Cognitive Science Technical Report Abstract We describe variational approximation methods efficient probabilistic reasoning applying methods problem diagnostic inference QMRDT database The QMRDT database largescale belief network based statistical expert knowledge internal medicine The size complexity network render exact probabilistic diagnosis infeasible small set cases This hindered development QMR DT network practical diagnostic tool hindered researchers exploring critiquing diagnostic behavior QMR In paper describe variational approximation methods applied QMR network resulting fast diagnostic inference We evaluate accuracy methods set standard diagnostic cases compare stochastic sampling methods
The effects neural networks topology performance well known yet question finding optimal configurations automatically remains largely open This paper proposes solution problem RBF networks A self optimising approach driven evolutionary strategy taken The algorithm uses output information computationally efficient approximation RBF networks optimise Kmeans clustering process coevolving two determinant parameters networks layout number centroids centroids positions Empirical results demonstrate promise
This paper describes hybrid methodology integrates genetic algorithms decision tree learning order evolve useful subsets discriminatory features recognizing complex visual concepts A genetic algorithm GA used search space possible subsets large set candidate discrimination features Candidate feature subsets evaluated using C decisiontree learning algorithm produce decision tree based given features using limited amount training data The classification performance resulting decision tree unseen testing data used fitness underlying feature subset Experimental results presented show increasing amount learning significantly improves feature set evolution difficult visual recognition problems involving satellite facial image data In addition also report extent subtle aspects Baldwin effect exhibited system
In paper examine behavior humancomputer system crisis response As one instance crisis management describe task responding spills fires involving hazardous materials We describe INCA intelligent assistant planning scheduling domain relation human users We focus INCAs strategy retrieving case case library seeding initial schedule helping user adapt seed We also present three hypotheses behavior mixedinitiative system experiments designed test The results suggest approach leads faster response development usergenerated automaticallygenerated schedules without sacrificing solution quality
Given adequate simulation model task environment payoff function measures quality partially successful plans competitionbased heuristics genetic algorithms develop high performance reactive rules interesting sequential decision tasks We previously described implemented system called SAMUEL learning reactive plans shown system successfully learn rules laboratory scale tactical problem In paper describe method deriving explanations justify success empirically derived rule sets The method consists inferring plausible subgoals explaining reactive rules trigger sequence actions ie stra tegy satisfy subgoals
Machine learning valuable tool improving flexibility efficiency robot applications Many approaches applying machine learning robotics known Some approaches enhance robots highlevel processing planning capabilities Other approaches enhance lowlevel processing control basic actions In contrast approach presented paper uses machine learning enhancing link lowlevel representations sensing action highlevel representation planning The aim facilitate communication robot human user A hierarchy concepts learned route records mobile robot Perception action combined every level ie concepts perceptually anchored The relational learning algorithm grdt developed completely searches hypothesis space restricted rule schemata user defines terms grammars
We motivate use convergence diagnostic techniques Markov Chain Monte Carlo algorithms review various methods proposed MCMC literature A common notation established method discussed particular emphasis implementational issues possible extensions The methods compared terms interpretability applicability recommendations provided particular classes problems
For target tracking task handheld camera anthropomorphic OSCARrobot manipulator track object moves arbitrarily table The desired camerajoint mapping approximated feedforward neural network Through use time derivatives position object manipulator controller inherently predict next position moving target object In paper several anticipative controllers described successfully applied track moving object
Covariance information help algorithm search predictive causal models estimate strengths causal relationships This information discarded conditional independence constraints identified usual contemporary causal induction algorithms Our fbd algorithm combines covariance information effective heuristic build predictive causal models We demonstrate fbd accurate efficient In one experiment assess fbds ability find best predictors variables another compare performance using many measures Pearl Vermas ic algorithm And although fbd based multiple linear regression cite evidence performs well problems difficult regression algorithms
The problem learning decision rules sequential tasks addressed focusing problem learning tactical decision rules simple flight simulator The learning method relies notion competition employs genetic algorithms search space decision policies Several experiments presented address issues arising differences simulation model learning occurs target environment decision rules ultimately tested
The inductive learning problem consists learning concept given examples nonexamples concept To perform learning task inductive learning algorithms bias learning method Here discuss biasing learning method use previously learned concepts domain These learned concepts highlight useful information concepts domain We describe transference bias present MFOCL Horn clause relational learning algorithm utilizes bias learn multiple concepts We provide preliminary empirical evaluation show effects biasing previous information noisefree noisy data
Choosing architecture neural network one important problems making neural networks practically useful accounts applications usually sweep details carpet How many hidden units needed Should weight decay used much What type output units chosen And We address issues within framework statistical theory model This paper principally concerned architecture selection issues feedforward neural networks also known multilayer perceptrons Many issues arise selecting radial basis function networks recurrent networks widely These problems occur much wider context within statistics applied statisticians selecting combining models decades Two recent discussions References discuss neural networks statistical perspective choice provides number workable approximate answers
We propose modeltheoretic definition causation show contrary common folklore genuine causal influences distinguished spurious covariations following standard norms inductive reasoning We also establish complete characterization conditions distinction possible Finally provide prooftheoretical procedure inductive causation show large class data structures effective algorithms exist uncover direction causal influences defined
This study deals alltoall broadcast CNS We determine lower bound run time present algorithm meeting bound Since study points bottleneck network interface also analyze performance alternative interface designs Our analyses based run time model network
Automated decision making often complicated complexity knowledge involved Much complexity arises contextsensitive variations underlying phenomena We propose framework representing descriptive contextsensitive knowledge Our approach attempts integrate categorical uncertain knowledge network formalism This paper outlines basic representation constructs examines expressiveness efficiency discusses potential applications framework
We discuss number methods estimating standard error predicted values multilayer perceptron These methods include delta method based Hessian bootstrap estimators sandwich estimator The methods described compared number examples We find bootstrap methods perform best partly capture variability due choice starting weights
Discrete mixtures normal distributions widely used modeling amplitude fluctuations electrical potentials synapses human animal nervous systems The usual framework independent data values j arising j j x n j means j come discrete prior G unknown x n j observed x j j n gaussian noise terms A practically important development associated statistical methods issue nonnormality noise terms often norm rather exception neurological context We recently developed models based convolutions Dirichlet process mixtures problems Explicitly model noise data values x j arising Dirichlet process mixture normals addition modeling location prior G Dirichlet process This induces Dirichlet mixture mixtures normals whose analysis may developed using Gibbs sampling techniques We discuss models analysis illustrate context neurological response analysis
Neural controllers able position handheld camera DOF anthropomorphic OSCARrobot manipulator object arbitrary placed table The desired camerajoint mapping approximated feedforward neural networks However object moving manipulator lags behind required time preprocess visual information move manipulator Through use time derivatives position object manipulator controller inherently predict next position object In paper several predictive controllers proposed successfully applied track moving object
This paper overviews AA Adaptive Algorithm model ASOCS Adaptive Self Organizing Concurrent Systems approach It also presents promising empirical generalization results AA actual data AA topologically dynamic network grows fit problem learned AA generalizes selforganizing fashion network seeks find features discriminate concepts Convergence training set guaranteed bounded linearly time
This communication deals source separation problem consists separation noisy mixture independent sources without priori knowledge mixture coefficients In paper consider maximum likelihood ML approach discrete source signals known probability distributions An important feature ML approach Gaussian noise covariance matrix additive noise treated parameter Hence necessary know model spatial structure noise Another striking feature offered case discrete sources mild assumptions possible separate sources sensors In paper consider maximization likelihood via ExpectationMaximization EM algorithm
If robust statistical model developed classify health system wellknown Taylor series approximation technique forms basis diagnosticrecovery procedure initiated systems health degrades fails altogether This procedure determines ranked set probable causes degraded health state used prioritized checklist isolating system anomalies quantifying corrective action The diagnosticrecovery procedure applicable classifier known robust applied neural network traditional parametric pattern classifiers generated supervised learning procedure empirical riskbenefit measure optimized We describe procedure mathematically demonstrate ability detect diagnose causes faults NASAs Deep Space Communications Complex Goldstone California
Case combination difficult problem Case Based Reasoning subcases often exhibit conflicts merged together In previous work formalized case combination representing case constraint satisfaction problem used minimum conflicts algorithm systematically synthesize global solution However also found instances problem minimum conflicts algorithm perform case combination efficiently In paper describe situations initially retrieved cases easily adaptable propose method improve case adaptability genetic algorithm We introduce fitness function maintains much retrieved case information possible also perturbing subsolution allow subsequent case combination proceed efficiently
The Dynamic Constraint Satisfaction Problem DCSP formalism gaining attention valuable often necessary extension static CSP framework Dynamic Constraint Satisfaction enables CSP techniques applied extensively since applied domains set constraints variables involved problem evolves time At time CaseBased Reasoning CBR community working techniques reuse existing solutions solving new problems We observed dynamic constraint satisfaction matches closely casebased reasoning process case adaptation These observations emerged previous work combining CBR CSP achieve constraintbased adaptation This paper summarizes previous results describes similarity challenges facing DCSP case adaptation shows CSP CBR together begin address chal lenges
Prior knowledge bias regarding concept speed task learning Probably Approximately Correct PAC learning mathematical model concept learning used quantify speed due different forms bias learning Thus far PAC learning mostly used analyze syntactic bias limiting concepts conjunctions boolean prepositions This paper demonstrates PAC learning also used analyze semantic bias domain theory concept learned The key idea view hypothesis space PAC learning consistent prior knowledge syntactic semantic In particular paper presents PAC analysis determinations type relevance knowledge The results analysis reveal crisp distinctions relations among different determinations illustrate usefulness analysis based PAC model
Computational models natural systems often contain free parameters must set optimize predictive accuracy models This process called calibrationcan viewed form supervised learning presence prior knowledge In view fixed aspects model constitute prior knowledge goal learn values free parameters We report series attempts learn parameter values global vegetation model called MAPSS Mapped AtmospherePlantSoil System developed collaborator Ron Neilson Standard machine learning methods work MAPSS constraints introduced structure model create difficult nonlinear optimization problem We developed new divideandconquer approach subsets parameters calibrated others held constant This approach succeeds possible select training examples exercise portions model
The paper considers situation learners testing set contains close approximations cases appear training set Such cases considered virtual seens since approximately seen learner Generalisation measures take account frequency virtual seens may misleading The paper shows NN algorithm used derive normalising baseline generalisation statistics The normalisation process demonstrated though application Holtes study generalisation performance R algorithm tested C commonly used datasets
Initial Results Abstract Conversational casebased reasoning CBR systems incrementally extract query description userdirected conversation advertised ease use However designing large case libraries good performance ie precision querying efficiency difficult CBR vendors provide guidelines designing libraries manually guidelines difficult apply We describe automated inductive approach revises conversational case libraries increase conformance design guidelines Revision increased performance three conversational case libraries
Diagnosis process identifying disorders machine patient considering history symptoms signs Starting possible initial information new information requested sequential manner diagnosis made precise It thus missing data problem since everything known We model joint probability distribution data case database mixture models Model parameters estimated EM algorithm gives additional benefit missing data database also handled correctly Request new information refine diagnosis performed using maximum utility principle decision theory Since system based machine learning domain independent An example using heart disease database presented
We give example neural net without hidden layers sigmoid transfer function together training set binary vectors sum squared errors regarded function weights local minimum global minimum The example consists set training instances four weights threshold learnt We know substantially smaller binary examples exist
The multiple extension problem arises default theory use different subsets defaults propose different mutually incompatible answers queries This paper presents algorithm uses set observations learn credulous version default theory essentially optimally accurate In detail associate given default theory set related credulous theories R fR g R uses total ordering defaults determine single answer return query Our goal select credulous theory highest expected accuracy R expected accuracy probability answer produces query correspond correctly world Unfortunately theorys expected accuracy depends distribution queries usually known Moreover task identifying optimal R opt R even given distribution information intractable This paper presents method OptAcc sidesteps problems using set samples estimate unknown distribution hillclimbing local optimum In particular given parameters ffi gt OptAcc produces R oa R whose expected accuracy probability least ffi within local optimum Appeared ECAI Workshop Theoretical Foundations Knowledge Representation Reasoning
Compression information important concept theory learning We argue hypothesis inherent compression pressure towards short elegant general solutions genetic programming system variable length evolutionary algorithms This pressure becomes visible size complexity solutions measured without noneffective code segments called introns The built parsimony pressure effects complex fitness functions crossover probability generality maximum depth length solutions explicit parsimony granularity fitness function initialization depth length modularization Some effects positive negative In work provide basis analysis effects suggestions overcome negative implications order obtain balance needed successful evolution An empirical investigation supports hypothesis also presented
Wilsons recent XCS classifier system forms complete mappings payoff environment reinforcement learning tradition thanks accuracy based fitness According Wilsons Generalization Hypothesis XCS tendency towards generalization With XCS Optimality Hypothesis I suggest XCS systems evolve optimal populations representations populations accurately map inputaction pairs payoff predictions using smallest possible set nonoverlapping classifiers The ability XCS evolve optimal populations boolean multiplexer problems demonstrated using condensation technique evolutionary search suspended setting crossover mutation rates zero Condensation automatically triggered selfmonitoring performance statistics entire learning process terminated autotermination Combined techniques allow classifier system evolve optimal representations boolean functions without form supervision
Current rule induction systems eg CN typically rely separate conquer strategy learning rule stilluncovered examples This results dwindling number examples available learning successive rules adversely affecting systems accuracy An alternative learn rules simultaneously using entire training set This approach implemented Rise system Empirical comparison Rise CN suggests conquering without separating performs similarly counterpart simple domains achieves increasingly substantial gains accuracy domain difficulty grows
A genetic programming method investigated optimizing architecture connection weights multilayer feedforward neural networks The genotype network represented tree whose depth width dynamically adapted particular application specifically defined genetic operators The weights trained nextascent hillclimbing search A new fitness function proposed quantifies principle Occams razor It makes optimal tradeoff error fitting ability parsimony network We discuss results two problems differing complexity study convergence scaling properties algorithm
The performance neural network categorizes facial expressions compared human subjects set experiments using interpolated imagery The experiments human subjects neural networks make use interpolations facial expressions Pictures Facial Affect Database Ekman Friesen The difference materials used human subjects experiments Young et al materials manner interpolated images constructed imagequality morphs versus pixel averages Nevertheless neural network accurately captures categorical nature human responses showing sharp transitions labeling images along interpolated sequence Crucially demonstration categorical perception Harnad model shows highest discrimination transition images crossover point The model also captures shape reaction time curves human subjects along sequences Finally network matches human subjects judgements expressions mixed images The main failing model intrusions neutral responses transitions seen human subjects We attribute difference difference pixel average stimuli image quality morph stimuli These results show simple neural network classifier access biological constraints presumably imposed human emotion processor whose access surrounding culture category labels placed American subjects facial expressions nevertheless simulate fairly well human responses emotional expressions
The article hand discusses tool automatic generation structured models complex dynamic processes means genetic programming In contrast techniques use genetic programming find appropriate arithmetic expression order describe inputoutput behaviour process tool based block oriented approach transparent description signal paths A short survey techniques computer based system identification given basic concept SMOG Structured MOdel Generator described Furthermore latest extensions system presented detail including automatically defined submodels quali tative fitness criteria
We examine role hyperplane ranking search performed simple genetic algorithm We also develop metric measuring degree ranking exists respect static measurements taken directly function well measurement dynamic ranking hyperplanes genetic search We show degree dynamic ranking induced simple genetic algorithm highly correlated degree static ranking inherent function especially initial genera tions search
Genetic algorithms rely two genetic operators crossover mutation Although exists large body conventional wisdom concerning roles crossover mutation roles captured theoretical fashion For example never theoretically shown mutation sense less powerful crossover vice versa This paper provides answers questions theoretically demonstrating important characteristics operator captured
In recent paper Friedman Geiger Goldszmidt introduced classifier based Bayesian networks called Tree Augmented Naive Bayes TAN outperforms naive Bayes performs competitively C stateoftheart methods This classifier several advantages including robustness polynomial computational complexity One limitation TAN classifier applies discrete attributes thus continuous attributes must prediscretized In paper extend TAN deal continuous attributes directly via parametric eg Gaussians semiparametric eg mixture Gaussians conditional probabilities The result classifier represent combine discrete continuous attributes In addition propose new method takes advantage modeling language Bayesian networks order represent attributes discrete continuous form simultaneously use versions classification This automates process deciding form attribute relevant classification task It also avoids commitment either discretized semiparametric form since different attributes may correlate better one version Our empirical results show latter method usually achieves classification performance good better either purely discrete purely continuous TAN models
This paper considers problem representing complex systems evolve stochastically time Dynamic Bayesian networks provide compact representation stochastic processes Unfortunately often unwieldy since explicitly model complex organizational structure many real life systems fact processes typically composed several interacting subprocesses turn decomposed We propose hierarchically structured representation language extends dynamic Bayesian networks objectoriented Bayesian network framework show language allows us describe systems natural modular way Our language supports natural representation certain system characteristics hard capture using traditional frameworks For example allows us represent systems processes evolve different rate others systems processes interact intermittently We provide simple inference mechanism representation via translation Bayesian networks suggest ways inference algorithm exploit additional structure encoded representation
It often difficult predict optimal neural network size particular application Constructive destructive methods add subtract neurons layers connections etc might offer solution problem We prove one method Recurrent Cascade Correlation due topology fundamental limitations representation thus learning capabilities It represent monotone ie sigmoid hardthreshold activation functions certain finite state automata We give preliminary approach get around limitations devising simple constructive training method adds neurons training still preserving powerful fullyrecurrent structure We illustrate approach simulations learn many examples regular grammars
Indexing cases important topic MemoryBased ReasoningMBR One key problem assign weights attributes cases Although several weighting methods proposed methods handle numeric attributes directly necessary discretize numeric values classification Furthermore existing methods theoretical background little said optimality We propose new weighting method based statistical technique called Quantification Method II It handle numeric symbolic attributes framework Generated attribute weights optimal sense maximize ratio variance classes variance cases Experiments several benchmark tests show many cases method obtains higher accuracies weighting methods The results also indicate distinguish relevant attributes irrelevant ones tolerate noisy data
A General Result Stabilization Linear Systems Using Bounded Controls ABSTRACT We present two constructions controllers globally stabilize linear systems subject control saturation We allow essentially arbitrary saturation functions The conditions imposed system obvious necessary ones namely eigenvalues uncontrolled system positive real part standard stabilizability rank condition hold One constructions terms neuralnetwork type onehidden layer architecture one terms cascades linear maps saturations
This paper proposes classification scheme based integration multiple Ensembles ANNs It demonstrated classification problem seismic signals Natural Earthquakes must distinguished seismic signals Artificial Explosions A Redundant Classification Environment consists several Ensembles Neural Networks created trained Bootstrap Sample Sets using various data representations architectures The ANNs within Ensembles aggregated Bagging Ensembles integrated nonlinearly signal adaptive manner using posterior confidence measure based agreement variance within Ensembles The proposed Integrated Classification Machine achieved correct classifications seismic test data Cross Validation evaluations comparisons indicate integration collection ANNs Ensembles robust way handling high dimensional problems complex nonstationary signal space current Seismic Classification problem
This first draft chapter Bayesian Biostatistics edited Donald A Berry Darlene K Strangl Adrian E Raftery Professor Statistics Sociology Department Statistics GN University Washington Seattle WA USA Sylvia Richardson Directeur de Recherche INSERM Unite avenue Paul Vaillant Couturier Villejuif CEDEX France Rafterys research supported ONR contract NJ Ministere de la Recherche et de lEspace Paris Universite de Paris VI INRIA Rocquencourt France Raftery thanks latter two institutions Paul Deheuvels Gilles Celeux hearty hospitality Paris sabbatical part chapter written The authors grateful Christine Montfort excellent research assistance Mariette Gerber Michel Chavance David Madigan helpful discussions
ProductionManufacturing scheduling typically involves acquisition user optimization preferences The illstructuredness problem space desired objectives make practical scheduling problems difficult formalize costly solve especially problem configurations user optimization preferences change time This paper advocates incremental revision framework improving schedule quality incorporating user dynamically changing preferences CaseBased Reasoning Our implemented system called CABINS records situationdependent tradeoffs consequences result schedule revision guide schedule improvement The preliminary experimental results show CABINS able effectively capture user static dynamic preferences known system exist implicitly extensional manner case base
Realization autonomous behavior mobile robots using fuzzy logic control requires formulation rules collectively responsible necessary levels intelligence Such collection rules conveniently decomposed efficiently implemented hierarchy fuzzybehaviors This article describes done using behaviorbased architecture A behavior hierarchy mechanisms control decisionmaking described In addition approach behavior coordination described emphasis evolution fuzzy coordination rules using genetic programming GP paradigm Both conventional GP steadystate GP applied evolve fuzzybehavior sensorbased goalseeking The usefulness behavior hierarchy partial design GP evident performance results simulated autonomous navigation
We present distribution model binary vectors called influence combination model show model used basis unsupervised learning algorithms feature selection The model closely related Harmonium model defined Smolensky RMCh In first part paper analyze properties distribution representation scheme We show arbitrary distributions binary vectors approximated combination model We show weight vectors model interpreted high order correlation patterns among input bits We compare combination model mixture model principle component analysis In second part paper present two algorithms learning combination model examples The first algorithm based gradient ascent Here give closed form gradient significantly easier compute corresponding gradient general Boltzmann machine The second learning algorithm greedy method creates hidden units computes weights one time This method variant projection pursuit density estimation In third part paper give experimental results learning methods synthetic data natural data handwritten digit images
Complex group behavior arises social insects colonies integration actions simple redundant individual insects Adler Gordon Oster Wilson Furthermore colony act information center expedite foraging Brown We apply lessons natural systems model collective action memory computational agent society Collective action expedite search combinatorial optimization problems Dorigo et al Collective memory improve learning multiagent systems Garland Alterman Our collective adaptation integrates simplicity collective action pattern detection collective memory significantly improve gathering processing knowledge As test role society information center examine ability society distribute task allocation without omnipotent centralized control
We study annealed theories learning boolean functions using concept class finite cardinality The naive annealed theory used derive universal learning curve bound zero temperature learning similar inverse square root bound VapnikChervonenkis theory Tighter nonuniversal learning curve bounds also derived A refined annealed theory leads still tighter bounds cases similar results previously obtained using onestep replica symmetry breaking
This article describes numerical method may used efficiently locate track underwater sonar targets nearfield bearing range estimation case large passive arrays The approach used requirement priori knowledge source uses limited information receiver array shape The role sensor position uncertainty consequence targets always nearfield analysed problems associated manipulation large matrices inherent conventional eigenvalue type algorithms noted A simpler numerical approach presented reduces problem search optimization When using method location target corresponds finding position maximum weighted sum output sensors Since search procedure dealt using modern stochastic optimization methods genetic algorithm operational requirement acceptable accuracy achieved real time usually met The array studied consists elements positioned along flexible cable towed behind ship sensors giving effective aperture For long array far field assumption used beamforming algorithms longer appropriate The waves emitted targets considered curved rather plane It shown simulated data significant noise
This paper introduces new type intelligent agent called constructive inductionbased learning agent CILA This agent differs adaptive agents ability learn assist user task also incrementally adapt knowledge representation space better fit given learning task The agents ability autonomously make problemoriented modifications originally given representation space due constructive induction CI learning method Selective induction SI learning methods agents based methods rely good representation space A good representation space misclassification noise intercorrelated attributes irrelevant attributes Our proposed CILA methods overcoming problems In agent domains poor representations CIbased learning agent learn accurate rules useful SIbased learning agent This paper gives architecture CIbased learning agent gives empirical comparison CI SI set six abstract domains involving DNFtype disjunctive normal form descriptions
We propose method decreasing computational complexity selforganising maps The method uses partitioning neurons disjoint clusters Teaching neurons occurs clusterbasis instead neuronbasis For teaching Nneuron network N samples computational complexity decreases ON N ON log N Furthermore introduce measure amount order selforganising map show introduced algorithm behaves well original algorithm
Inductive learning relational domains shown intractable general Many approaches task suggested nevertheless way restrict hypothesis space searched They roughly divided two groups datadriven restriction encoded algorithm modelbased restrictions made less explicit form declarative bias This paper describes Incy inductive learner seeks combine aspects approaches Incy initially datadriven using examples background knowledge put forth specialize hypotheses based connectivity data hand It modeldriven hypotheses abstracted rule models used control decisions datadriven phase modelguided induction Key Words Inductive learning relational domains cooperation datadriven modelguided methods implicit declarative bias
The problem learning decision rules sequential tasks addressed focusing problem learning tactical plans simple flight simulator plane must avoid missile The learning method relies notion competition employs genetic algorithms search space decision policies Experiments presented address issues arising differences simulation model learning occurs target environment decision rules ultimately tested Specifically either model target environment may contain noise These experiments examine effect learning tactical plans without noise testing plans noisy environment effect learning plans noisy simulator testing plans noisefree environment Empirical results show best result obtained training model closely matches target environment using training environment noisy target environment better using using training environment less noise target environment
Navigation obstacles mine fields important capability autonomous underwater vehicles One way produce robust behavior perform projective planning However realtime performance critical requirement navigation What needed truly autonomous vehicle robust reactive rules perform well wide variety situations also achieve realtime performance In work SAMUEL learning system based genetic algorithms used learn highperformance reactive strategies navigation collision avoidance
In paper introduce investigate mathematically rigorous theory learning curves based ideas statistical mechanics The advantage theory wellestablished VapnikChervonenkis theory bounds considerably tighter many cases also reflective true behavior functional form learning curves This behavior often exhibit dramatic properties phase transitions well power law asymptotics explained VC theory The disadvantages theory application requires knowledge input distribution limited far finite cardinality function classes We illustrate results many concrete examples learning curve bounds derived theory
Although considerable interest shown language inference automata induction using recurrent neural networks success models mostly limited regular languages We previously demonstrated Neural Network Pushdown Automaton NNPDA model capable learning deterministic contextfree languages eg n b n parenthesis languages examples However learning task computationally intensive In paper discuss ways priori knowledge task data could used efficient learning We also observe knowledge often experimental prerequisite learning nontrivial languages eg n b n cb
Connectionist learning procedures presented sigmoid noisyOR varieties stochastic feedforward network These networks class belief networks used expert systems They represent probability distribution set visible variables using hidden variables express correlations Conditional probability distributions exhibited stochastic simulation use tasks classification Learning empirical data done via gradientascent method analogous used Boltzmann machines due feedforward nature connections negative phase Boltzmann machine learning unnecessary Experimental results show result learning sigmoid feedforward network faster Boltzmann machine These networks advantages Boltzmann machines pattern classification decision making applications provide link work connectionist learning work representation expert knowledge
Genetic Programming GP uses variable size representations programs Size becomes important interesting emergent property structures evolved GP The size programs controlling controlled factor GP search Size influences efficiency search process related generality solutions This paper analyzes size generality issues standard GP GP using subroutines addresses question whether analysis help control search process We relate size generalization modularity issues programs evolved control agent dynamic nondeterministic environment exemplified PacMan game
We present definition cause effect terms decisiontheoretic primitives thereby provide principled foundation causal reasoning Our definition departs traditional view causation causal assertions may vary set decisions available We argue approach provides added clarity notion cause Also paper examine encoding causal relationships directed acyclic graphs We describe special class influence diagrams canonical form show relationship Pearls representation cause effect Finally show canonical form facilitates counterfactual reasoning
Fuzzy logic evolutionary computation proven convenient tools handling realworld uncertainty designing control systems respectively An approach presented combines attributes paradigms purpose developing intelligent control systems The potential genetic programming paradigm GP learning rules use fuzzy logic controllers FLCs evaluated focussing problem discovering controller mobile robot path tracking Performance results incomplete rulebases compare favorably complete FLC designed usual trialanderror approach A constrained syntactic representation supported structurepreserving genetic operators also introduced
We review estimation interval censoring models including nonparametric estimation distribution function estimation regression models In nonparametric setting describe computational procedures asymptotic properties nonparametric maximum likelihood estimators In regression setting focus proportional hazards proportional odds accelerated failure time semiparametric regression models Particular emphasis given calculation Fisher information regression parameters We also discuss computation regression parameter estimators via profile likelihood maximization semiparametric likelihood distributional results maximum likelihood estimators estimation asymptotic variances Some problems open questions also reviewed
Genetic programming distinguished evolutionary algorithms uses tree representations variable size instead linear strings fixed length The flexible representation scheme important allows underlying structure data discovered automatically One primary difficulty however solutions may grow big without improvement generalization ability In paper investigate fundamental relationship performance complexity evolved structures The essence parsimony problem demonstrated empirically analyzing error landscapes programs evolved neural network synthesis We consider genetic programming statistical inference problem apply Bayesian modelcomparison framework introduce class fitness functions error complexity terms An adaptive learning method presented automatically balances modelcomplexity factor evolve parsimonious programs without losing diversity population needed achieving desired training accuracy The effectiveness approach empirically shown induction sigmapi neural networks solving realworld medical diagnosis problem well benchmark tasks
Dynamic probabilistic networks DPNs useful tool modeling complex stochastic processes The simplest inference task DPNs monitoring computing posterior distribution state variables time step given observations time Recursive constantspace algorithms wellknown monitoring DPNs models This paper concerned hindsight computing posterior distribution given past future observations Hindsight essential subtask learning DPN models data Existing algorithms hindsight DPNs use OSN space time N total length observation sequence S state space size time step They therefore impractical hindsight complex models long observation sequences This paper presents OS log N space OSN log N time hindsight algorithm We demonstrates effectiveness algorithm two realworld DPN learning problems We also discuss possibility OSspace OSN time algorithm
Learning methods vary optimism pessimism regard informativeness learned knowledge Pessimism implicit hypothesis testing wish draw cautious conclusions experimental evidence However paper demonstrates optimism utility derived rules may preferred bias learning systems We examine continuum naive pessimism naive optimism context decision tree learner prunes rules based stringent ie pessimistic weak ie optimistic tests significance Our experimental results indicate cases optimism preferred particularly cases sparse training data high noise This work generalizes earlier findings Fisher Schlimmer Schaffer discuss relevance unsupervised learning small disjuncts issues
We already shown extracting longterm dependencies sequential data difficult deterministic dynamical systems recurrent networks probabilistic models hidden Markov models HMMs inputoutput hidden Markov models IOHMMs In practice avoid problem researchers used domain specific apriori knowledge give meaning hidden state variables representing past context In paper propose use general type apriori knowledge namely temporal dependencies structured hierarchically This implies longterm dependencies represented variables long time scale This principle applied recurrent network includes delays multiple time scales Experiments confirm advantages structures A similar approach proposed HMMs IOHMMs
We present new algorithm finding low complexity neural networks high generalization capability The algorithm searches flat minimum error function A flat minimum large connected region weightspace error remains approximately constant An MDLbased Bayesian argument suggests flat minima correspond simple networks low expected overfitting The argument based Gibbs algorithm variant novel way splitting generalization error underfitting overfitting error Unlike many previous approaches require Gaussian assumptions depend good weight prior instead prior inputoutput functions thus taking account net architecture training set Although algorithm requires computation second order derivatives backprops order complexity Automatically effectively prunes units weights input lines Various experiments feedforward recurrent nets described In application stock market prediction flat minimum search outperforms conventional backprop weight decay optimal brain surgeon optimal brain damage We also provide pseudo code algorithm omitted NCversion
This paper describes method improving comprehensibility accuracy generality reactive plans A reactive plan set reactive rules Our method involves two phases formulate explanations execution traces generate new reactive rules explanations Since explanation phase previously described primary focus paper rule generation phase This latter phase consists taking subset explanations using explanations generate set new reactive rules add original set The particular subset explanations chosen yields rules provide new domain knowledge handling knowledge gaps original rule set The original rule set complimentary manner provides expertise fill gaps domain knowledge provided new rules incomplete
Technical Report AI May Abstract A new method developing good valueordering strategies constraint satisfaction search presented Using evolutionary technique called SANE individual neurons evolve cooperate form neural network problemspecific knowledge discovered results better valueordering decisions based problemgeneral heuristics A neural network evolved chronological backtrack search decide ordering cars resourcelimited assembly line The network required backtracks random ordering backtracks maximization future options heuristic The SANE approach extend well domains heuristic information either difficult discover problemspecific
Conversational casebased reasoning CBR shells eg Inferences CBR Express commercially successful tools supporting development help desk related applications In contrast rulebased expert systems capture knowledge cases rather problematic rules incrementally extended However rather eliminate knowledge engineering bottleneck refocus case engineering task carefully authoring cases according library design guidelines ensure good performance Designing complex libraries according guidelines difficult software needed assist users case authoring We describe approach revising case libraries according design guidelines implementation Clire empirical results showing conditions approach improve conversational CBR performance
We present computational model movement skill learning The types skills addressed class trajectory following movements involving multiple accelerations decelerations changes direction lasting seconds These skills acquired observation improved practice We also review speedaccuracy tradeoffone robust phenomena human motor behavior We present two speedaccuracy tradeoff experiments models performance fits human behavior quite well
Reinforcement Learning class problems autonomous agent acting given environment improves behavior progressively maximizing function calculated basis succession scalar responses received environment Qlearning classifier systems CS two methods among used solve reinforcement learning problems Notwithstanding popularity shared goal past often considered two different models In paper first show classifier system restricted sharp simplification called discounted max simple classifier system D MAX VSCS boils tabular Qlearning It follows D MAX VSCS converges optimal policy proved Watkins Dayan draw profit results experimental theoretical works dedicated improve Qlearning facilitate use concrete applications In second part paper show three restrictions need impose CS deriving equivalence Qlearning internal states dont care symbols structural changes turn essential recently rediscovered reprogrammed Qlearning adepts Eventually sketch similarities among ongoing work within research contexts The main contribution paper therefore make explicit strong similarities existing Qlearning classifier systems show experience gained research within one domain useful direct future research one
Some recent work investigated dichotomy compact coding using dimensionality reduction sparse distributed coding context understanding biological information processing We introduce artificial neural network self organises basis simple Hebbian learning negative feedback activation show capable forming compact codings data distributions also identifying filters sensitive sparse distributed codes The network extremely simple biological relevance investigated via response set images typical everyday life However analysis networks identification filter sparse coding reveals coding may globally optimal exists innate limiting factor transcended
For classes concepts defined certain classes analytic functions depending n parameters nonempty open sets samples length n shattered A slighly weaker result also proved piecewiseanalytic functions The special case neural networks discussed
Two important goals evaluation AI theory model assess merit design decisions performance implemented computer system analyze impact performance system faces problem domains different characteristics This particularly difficult casebased reasoning systems systems typically complex tasks domains operate We present methodology evaluation casebased reasoning systems systematic empirical experimentation range system configurations environmental conditions coupled rigorous statistical analysis results experiments This methodology enables us understand behavior system terms theory design computational model select best system configuration given domain predict system behave response changing domain problem characteristics A case study multistrategy casebased reinforcement learning system performs autonomous robotic navigation presented example
For casebased reasoner use knowledge flexibly must equipped powerful case adapter A casebased reasoner cope variation form problems given extent cases memory efficiently adapted fit wide range new situations In paper address task adapting abstract knowledge planning fit specific planning situations First show adapting abstract cases requires reconciling incommensurate representations planning situations Next describe representation system memory organization adaptation process tailored requirement Our approach implemented brainstormer planner takes abstract advice
The maximum likelihood estimator MLE proportional hazards model current status data studied It shown MLE regression parameter asymptotically normal p nconvergence rate achieves information bound even though MLE baseline cumulative hazard function converges n rate Estimation asymptotic variance matrix MLE regression parameter also considered To prove main results also establish general theorem showing MLE finite dimensional parameter class semiparametric models asymptotically efficient even though MLE infinite dimensional parameter converges rate slower The results illustrated applying data set tumoriginicity study Introduction In many survival analysis problems interested p
PO Box Wellington New Zealand Tel Fax Internet TechReportscompvuwacnz Technical Report CSTR October Abstract People often give advice telling stories Stories recommend course action exemplify general conditions recommendation appropriate A computational model advice taking using stories must address two related problems determining storys recommendations appropriateness conditions showing obtain new situation In paper present efficient solution second problem based caching results first Our proposal implemented brainstormer planner takes abstract advice
There increasing need efficient estimation mixture distributions especially following explosion use modelling tools many applied fields We propose paper Bayesian noninformative approach estimation normal mixtures relies reparameterisation secondary components mixture terms divergence main component As well providing intuitively appealing representation modelling stage reparameterisation important bearing prior distribution performance MCMC algorithms We compare two possible reparameterisations extending Mengersen Robert show reparameterisation link secondary components together associated poor convergence properties MCMC algorithms
A WorldWide Web WWW server implemented Common LISP order facilitate exploratory programming global hypermedia domain provide access complex research programs particularly artificial intelligence systems The server initially used provide interfaces document retrieval email servers More advanced applications include interfaces systems inductive rule learning naturallanguage question answering Continuing research seeks fully generalize automatic formprocessing techniques developed email servers operate seamlessly Web The conclusions argue presentationbased interfaces sophisticated form processing moved clients order reduce load servers provide advanced interaction models users
Survival analysis concerned finding models predict survival patients assess efficacy clinical treatment A key part modelbuilding process selection predictor variables It standard use stepwise procedure guided series significance tests select single model make inference conditionally selected model However ignores model uncertainty substantial We review standard Bayesian model averaging solution problem extend survival analysis introducing partial Bayes factors Cox proportional hazards model In two examples taking account model uncertainty enhances predictive performance extent could clinically useful
We propose bootstrapbased method model averaging selection focuses training points left individual bootstrap samples This information used estimate optimal weighting factors combining estimates different bootstrap samples also finding best subsets linear model setting These proposals provide alternatives Bayesian approaches model averaging selection requiring less computation fewer subjective choices
Technical Report December Statistics Department University California Berkeley CA Abstract The theory behind success adaptive reweighting combining algorithms arcing Adaboost Freund Schapire others reducing generalization error well understood By formulating prediction classification regression game one player makes selection instances training set convex linear combination predictors finite set existing arcing algorithms shown algorithms finding good game strategies An optimal game strategy finds combined predictor minimizes maximum error training set A bound generalization error combined predictors terms maximum error proven sharper bounds date Arcing algorithms described converge optimal strategy Schapire etal offered explanation Adaboost works terms ability reduce margin Comparing Adaboost optimal arcing algorithm shows explanation valid answer lies elsewhere In situation VCtype bounds misleading Some empirical results given explore situation
Conversational casebased reasoning CCBR form interactive casebased reasoning users input partial problem description text The CCBR system responds ranked solution display lists solutions stored cases whose problem descriptions best match users ranked question display lists unanswered questions cases Users interact displays either refining problem description answering selected questions selecting solution apply CCBR systems support dialogue inferencing infer answers questions implied problem description Otherwise questions listed user believes already answered The standard approach dialogue inferencing allows case library designers insert rules define implications problem description unanswered questions However approach imposes substantial knowledge engineering requirements We introduce alternative approach whereby intelligent assistant guides designer defining model case library implication rules derived We detail approach benefits explain supported integration ParkaDB fast relational database system We evaluate approach context CCBR system named NaCoDAE This paper appeared AAAI Spring Symposium Multimodal Reasoning NCARAI TR AIC We introduce integrated reasoning approach modelbased reasoning component performs important inferencing role conversational casebased reasoning CCBR system named NaCoDAE Breslow Aha Figure CCBR form casebased reasoning users enter text queries describing problem system assists eliciting refinements Aha Breslow Cases three components
A readonce formula boolean formula variable occurs Such formulas also called formulas boolean trees This paper treats problem exactly identifying unknown readonce formula using specific kinds queries The main results polynomial time algorithm exact identification monotone readonce formulas using membership queries polynomial time algorithm exact identification general readonce formulas using equivalence membership queries protocol based notion minimally adequate teacher Our results improve Valiants previous results readonce formulas We also show polynomial time algorithm using membership queries equivalence queries exactly identify readonce formulas
Recursive AutoAssociative Memory RAAM structures show promise general representation vehicle uses distributed patterns However training often difficult explains least part relatively small networks studied We show technique transforming collection hierarchical structures set training patterns sequential RAAM effectively trained using simple Elmanstyle recurrent network Tr aining produces set distributed patterns corresponding structures
We propose analyze distribution learning algorithm variable memory length Markov processes These processes described subclass probabilistic finite automata name Probabilistic Finite Suffix Automata The learning algorithm motivated real applications manmachine interaction handwriting speech recognition Conventionally used fixed memory Markov hidden Markov models either severe practical theoretical drawbacks Though general hardness results known learning distributions generated sources similar structure prove algorithm indeed efficiently learn distributions generated restricted sources In Particular show KLdivergence distribution generated target source distribution generated hypothesis made small high confidence polynomial time sample complexity We demonstrate applicability algorithm learning structure natural English text using hy pothesis correction corrupted text
The clausal discovery engine claudien presented claudien discovers regularities data representative inductive logic programming paradigm As represents data regularities means first order clausal theories Because search space clausal theories larger attribute value representation claudien also accepts input declarative specification language bias determines set syntactically wellformed regularities Whereas papers claudien focuss semantics logical problem specification claudien discovery algorithm PAClearning aspects paper wants illustrate power resulting technique In order achieve aim show claudien used learn integrity constraints databases functional dependencies determinations properties sequences mixed quantitative qualitative laws reverse engineering classification rules
In context machine learning examples paper deals problem estimating quality attributes without dependencies Greedy search prevents current inductive machine learning algorithms detect significant dependencies attributes Recently Kira Rendell developed RELIEF algorithm estimating quality attributes able detect dependencies attributes We show strong relation RELIEFs estimates impurity functions usually used heuristic guidance inductive learning algorithms We propose use RELIEFF extended version RELIEF instead myopic impurity functions We reimplemented Assistant system top induction decision trees using RELIEFF estimator attributes selection step The algorithm tested several artificial several real world problems Results show advantage presented approach inductive learning open wide rang possibilities using RELIEFF
An investigation dynamics Genetic Programming applied chaotic time series prediction reported An interesting characteristic adaptive search techniques ability perform well many problem domains failing others Because Genetic Programmings flexible tree structure particular problem represented myriad forms These representations variegated effects search performance Therefore aspect fundamental engineering significance find representation acted upon Genetic Programming operators optimizes search performance We discover case chaotic time series prediction representation commonly used domain yield optimal solutions Instead find population converges onto one accurately replicating tree trees explored To correct premature convergence make simple modification crossover operator In paper review previous work GP time series prediction pointing anomalous result related overlearning report improvement effected modified crossover operator
Current ILP algorithms typically use variants extensions greedy search This prevents detect significant relationships training objects Instead myopic impurity functions propose use heuristic based RELIEF guidance ILP algorithms At step ILPR system heuristic used determine beam candidate literals The beam used exhaustive search potentially good conjunction literals From efficiency point view introduce interesting declarative bias enables us keep growth training set introducing new variables within linear bounds linear respect clause length This bias prohibits crossreferencing variables variable dependency tree The resulting system tested various artificial problems The advantages deficiencies approach discussed
Instead myopic impurity functions propose use ReliefF heuristic guidance inductive learning algorithms The basic algoritm RELIEF developed Kira Rendell Kira Rendell ab able efficiently solve classification problems involving highly dependent attributes parity problems However sensitive noise unable deal incomplete data multiclass regression problems continuous class We extended RELIEF several directions The extended algorithm ReliefF able deal noisy incomplete data used multiclass problems regressional variant RReliefF deal regression problems Another area application inductive logic programming ILP instead myopic measures ReliefF used estimate utility literals theory construction
In paper present TDLeaf variation TD algorithm enables used conjunction minimax search We present experiments chess backgammon demonstrate utility provide comparisons TD another less radical variant TDdirected In particular chess program KnightCap used TDLeaf learn evaluation function playing Free Internet Chess Server FICS ficsonenetnet It improved rating rating games We discuss reasons success relationship results Tesauros results backgammon
This paper deals asymptotic properties MetropolisHastings algorithm distribution interest unknown approximated sequential estimator density We prove simple conditions rate convergence MetropolisHastings algorithm sequential estimator latter introduced reversible measure MetropolisHastings Kernel This problem natural extension previous work new simulated annealing algorithm sequential estimator energy
We explored two approaches recognizing faces across changes pose First developed representation face images based independent component analysis ICA compared principal component analysis PCA representation face recognition The ICA basis vectors data set spatially local PCA basis vectors ICA representation greater invariance changes pose Second present model development viewpoint invariant responses faces visual experience biological system The temporal continuity natural visual experience incorporated attractor network model Hebbian learning following lowpass temporal filter unit activities When combined temporal filter basic Hebbian update rule became generalization Griniasty et al associates temporally proximal input patterns basins attraction The system acquired rep resentations faces largely independent pose
Problems regression smoothing curve fitting addressed via predictive inference flexible class mixture models Multidimensional density estimation using Dirichlet mixture models provides theoretical basis semiparametric regression methods fitted regression functions may deduced means conditional predictive distributions These Bayesian regression functions features similar generalised kernel regression estimates formal analysis addresses problems multivariate smoothing parameter estimation assessment uncertainties regression functions naturally Computations based multidimensional versions existing Markov chain simulation analysis univariate Dirichlet mixture models
On basis early theoretical empirical studies genetic algorithms typically used point crossover operators standard mechanisms implementing recombination However number recent studies primarily empirical nature shown benefits crossover operators involving higher number crossover points From traditional theoretical point view surprising new results relate uniform crossover involves average L crossover points strings length L In paper extend existing theoretical results attempt provide broader explanatory predictive theory role multipoint crossover genetic algorithms In particular extend traditional disruption analysis include two general forms multipoint crossover npoint crossover uniform crossover We also analyze two aspects multipoint crossover operators namely recombination potential exploratory power The results analysis provide much clearer view role multipoint crossover genetic algorithms The implications results implementation issues performance discussed several directions research suggested
In paper discuss methodological issues using class neural networks called Mixture Density Networks MDN discriminant analysis MDN models advantage rigorous probabilistic interpretation proven viable alternative classification procedure discrete domains We address classification interpretive aspects discriminant analysis compare approach traditional method linear discrimin ants implemented standard statistical packages We show MDN approach adopted performs well aspects Many observations made restricted particular case hand applicable applications discriminant analysis educational research fl URL httpwwwcsHelsinkiFIresearchcosco
Satisfiability SAT refers task finding truth assignment makes arbitrary boolean expression true This paper compares simulated annealing algorithm SASAT GSAT Selman et al greedy algorithm solving satisfiability problems GSAT solve problem instances extremely difficult traditional satisfiability algorithms Results suggest SASAT scales better number variables increases solving least many hard SAT problems less effort The paper presents ablation study helps explain relative advantage SASAT GSAT Next improvement basic SASAT algorithm examined based random walk implemented GSAT Selman et al Finally examine performance SASAT test suite satisfiability problems produced DIMACS challenge
We present comparison errorbased entropybased methods discretization continuous features Our study includes extensive empirical comparison well analysis scenarios error minimization may inappropriate discretization criterion We present discretization method based C decision tree algorithm compare existing entropybased discretization algorithm employs Minimum Description Length Principle recently proposed errorbased technique We evaluate discretization methods respect C NaiveBayesian classifiers datasets UCI repository analyze computational complexity method Our results indicate entropybased MDL heuristic outperforms error minimization average We analyze shortcomings errorbased approaches comparison entropybased methods
Here show similar construction multipleoutput systems modifications Let A B C discretetime signlinear system state space IR n p outputs Perform change A n fi n invertible A n fi n nilpotent If A B reachable pair A C observable pair minimal sense signlinear system inputoutput behavior dimension least n But n lt n det A observable hence canonical Let us find another system necessarily signlinear inputoutput behavior canonical Let relative degree ith row Markov sequence A minf pg Let initial state x There difference case smallest relative degree greater equal n case lt n Roughly speaking n outputs signlinear system give us information sign Cx sign CAx sign CA x first outputs sys tem After use inputs outputs learn x first n components x When lt n may able use controls learn x last n components x time n nilpotency A finally Lemma Two states x z indistinguishable x z Proof In case n equations x z equality The first output terms exactly terms So equalities satisfied first output terms coincide x z input Equality everything first n components equivalent first n output terms coinciding x z since jth row qth output initial state x example either sign c j A q x j gt q sign c j A q x A j j u q j j q case may use control u q j identify c j A q x using Remark
So applying Corollary second equation conclude From get jgy n k obtain jy n k From see righthand side bounded Since system A k b jyj ev N Now suppose lim sup jytj gt Then jyj ev Since j kyj Ljyj using obtain jyj ev L ffi Note righthand side inequality trivial since know jyj ev From ffi N ffi gt N However see still holds So established cases From get jyj ev Taking lim sup lefthand side N ffi ie N ffi Substituting get jyj ev ffi jyj ev N ffi So take N N L conclusion follows To complete proof need deal general case gt inputs This done induction proof omitted Fuller AT In large stability relay saturated control systems linear controllers Int J Control Gutman PO P Hagander A new design constrained controllers linear systems IEEE Transactions Automat Contr AC Kosut RL Design linear systems saturating linear control bounded states IEEE Trans Autom Control AC Krikelis NJ SK Barkas Design tracking systems subject actuator saturation integrator windup Int J Control Schmitendorf WE BR Barmish Null controllability linear systems constrained controls SIAM J Control Opt Slemrod M Feedback stabilization linear control system Hilbert space Math Control Signals Systems Slotine JJE W Li Applied Nonlinear Control PrenticeHall Englewood Cliffs Sontag ED An algebraic approach bounded controllability linear systems Int J Control Sontag ED Remarks stabilization inputtostate stability Proc IEEE CDC Tampa Dec IEEE Publications pp Sontag ED Mathematical Control Theory Deterministic Finite Dimensional Systems Springer New York Sontag ED HJ Sussmann Nonlinear output feedback design linear systems saturating controls Proc IEEE CDC Honolulu Dec IEEE Publications pp Sussmann H J Y Yang On stabilizability multiple integrators means bounded feedback controls Proc IEEE CDC Brighton UK Dec IEEE Publications Teel AR Global stabilization restricted tracking multiple integrators bounded controls Systems Control Letters Yang Y HJ Sussmann ED Sontag Stabilization linear systems bounded controls Proc June NOLCOS Bordeaux M Fliess Ed IFAC Publications pp Yang Y Global Stabilization Linear Systems Bounded Feedback Ph D Thesis Mathematics Department Rutgers University jyj ev M ffi
Gross error detection plays vital role parameter estimation data reconciliation dynamic steady state systems In particular recent advances process optimization allow data reconciliation dynamic systems appropriate problem formulations need considered Data errors due either miscalibrated faulty sensors random events nonrepresentative underlying statistical distribution induce heavy biases parameter estimates reconciled data In paper concentrate robust estimators exploratory statistical methods allow us detect gross errors data reconciliation performed These robust methods property insensitive departures ideal statistical distributions therefore insensitive presence outliers Once regression done outliers detected readily using exploratory statistical techniques An important feature performance optimization algorithm uniqueness reconciled data ability classify variables according observability redundancy properties Here observable variable unmeasured quantity estimated measured variables physical model nonredundant variable measured variable estimated measurements Variable classification used aid design instrumentation schemes In
Many significant realworld classification tasks involve large number categories arranged hierarchical structure example classifying documents subject categories library congress scheme classifying worldwideweb documents topic hierarchies We investigate potential benefits using given hierarchy base classes learn accurate multicategory classifiers domains First consider possibility exploiting class hierarchy prior knowledge help one learn accurate classifier We explore benefits learning categorydiscriminants hard topdown fashion compare soft approach shares training data among sibling categories In verify hierarchies potential improve prediction accuracy But argue reasons subtle Sometimes improvement using hierarchy happens constrain expressiveness hypothesis class appropriate manner However various controlled experiments show cases performance advantage associated using hierarchy really seem due prior knowledge encodes
Many algorithms inferring decision tree data involve twophase process First large decision tree grown typically ends overfitting data To reduce overfitting second phase tree pruned using one number available methods The final tree output used classification test data In paper suggest alternative approach pruning phase Using given unpruned decision tree present new method making predictions test data prove algorithms performance much worse precise technical sense predictions made best reasonably small pruning given decision tree Thus procedure guaranteed competitive terms quality predictions pruning algorithm We prove procedure efficient highly robust Our method viewed synthesis two previously studied techniques First apply CesaBianchi et als results predicting using expert advice view pruning expert obtain algorithm provably low prediction loss computationally infeasible Next generalize apply method developed Buntine Willems Shtarkov Tjalkens derive efficient implementation procedure
We present efficient algorithm PAClearning general class geometric concepts lt fixed More specifically let T set halfspaces Let x x x arbitrary point lt With T associate boolean indicator function I x x halfspace The concept class C study consists concepts formed boolean function I I T This concept class much general geometric concept class known PAClearnable Our results easily extended efficiently learn boolean combination polynomial number concepts selected concept class C lt given VCdimension C dependence thus constant constant polynomial time algorithm determine concept C consistent given set labeled examples We also present statistical query version algorithm tolerate random classification noise noise rate strictly less Finally present generalization standard net result Haussler Welzl apply give alternative noisetolerant algorithm based geometric subdivisions
In work develop new criteria perform pessimistic decision tree pruning Our method theoretically sound based theoretical concepts uniform convergence VapnikChervonenkis dimension We show criteria well motivated theory side performs well practice The accuracy new criteria comparable current method used C
In paper study performance probabilistic networks context protein sequence analysis molecular biology Specifically report results initial experiments applying framework problem protein secondary structure prediction One main advantages probabilistic approach describe ability perform detailed experiments experiment different models We easily perform local substitutions mutations measure probabilistically effect global structure Windowbased methods support experimentation readily Our method efficient training prediction important order able perform many experiments different networks We believe probabilistic methods comparable methods prediction quality In addition predictions generated methods precise quantitative semantics shared classification methods Specifically causal statistical independence assumptions made explicit networks thereby allowing biologists study experiment different causal models convenient manner
In paper prove sanitycheck bounds error leaveoneout crossvalidation estimate generalization error bounds showing worstcase error estimate much worse training error estimate The name sanitycheck refers fact although often expect leaveoneout estimate perform considerably better training error estimate seeking assurance performance considerably worse Perhaps surprisingly assurance given limited cases prior literature crossvalidation Any nontrivial bound error leaveoneout must rely notion algorithmic stability Previous bounds relied rather strong notion hypothesis stability whose application primarily limited nearestneighbor local algorithms Here introduce new weaker notion error stability apply obtain sanitycheck bounds leaveoneout classes learning algorithms including training error minimization procedures Bayesian algorithms We also provide lower bounds demonstrating necessity form error stability proving bounds error leaveoneout estimate fact training error minimization algorithms worst case bounds must still depend VapnikChervonenkis dimension hypothesis class
This paper describes program observes behaviour actors simulated world uses observations guides conducting experiments An experiment sequence actions carried actor order support weaken case generalisation concept A generalisation attempted program observes state world similar previous state A partial matching algorithm used find substitutions enable two states unified The generalisation two states unifier
Widespread adoption Genetic Programming techniques domainindependent problem solving tool depends good underlying software structure A system presented mirrors conceptual makeup GP system Consisting loose collection software components strict interface definitions roles system maximises flexibility minimises effort applied new problem domain
Coevolution competitive species provides interesting testbed study role adaptive behavior provides unpredictable dynamic environments In paper experimentally investigate arguments coevolution different adaptive protean behaviors competing species predators preys Both species implemented simulated mobile robots Kheperas infrared proximity sensors predator additional vision module whereas prey maximum speed set twice predator Different types variability life neurocontrollers architecture genetic length compared It shown simple forms proteanism affect coevolutionary dynamics preys rather exploit noisy controllers generate random trajectories whereas predators benefit directionalchange controllers improve pursuit behavior
The calculation second derivatives required recent training analysis techniques connectionist networks elimination superfluous weights estimation confidence intervals weights network outputs We review develop exact approximate algorithms calculating second derivatives For networks jwj weights simply writing full matrix second derivatives requires Ojwj operations For networks radial basis units sigmoid units exact calculation necessary intermediate terms requires order h backwardforwardpropagation passes h number hidden units network We also review compare three approximations ignoring components second derivative numerical differentiation scoring Our algorithms apply arbitrary activation functions networks error functions instance connections skip layers radial basis functions crossentropy error Softmax units etc
In paper describe principles problem solving analogy applied domain functional program synthesis For reason treat programs syntactical structures We discuss two different methods handle structures graph metric determining distance two program schemes b Structure Mapping Engine existing system examine analogical processing Furthermore show experimental results discuss
There many ways learning system generalize training set data This paper presents several generalization styles using prototypes attempt provide accurate generalization training set data wide variety applications These generalization styles efficient terms time space lend well massively parallel architectures Empirical results generalizing several realworld applications given results indicate prototype styles generalization presented potential provide accurate generalization many applications
This paper provides exposition recent research regarding systemtheoretic aspects continuoustime recurrent dynamic neural networks sigmoidal activation functions The class systems introduced discussed result cited regarding universal approximation properties Known characterizations controllability observability parameter identifiability reviewed well result minimality Facts regarding computational power recurrent nets also mentioned fl Supported part US Air Force Grant AFOSR
Most Artificial Neural Networks ANNs fixed topology learning often suffer number shortcomings result ANNs use dynamic topologies shown ability overcome many problems Adaptive Self Organizing Concurrent Systems ASOCS class learning models inherently dynamic topologies This paper introduces LocationIndependent Transformations LITs general strategy implementing learning models use dynamic topologies efficiently parallel hardware A LIT creates set locationindependent nodes node computes part network output independent nodes using local information This type transformation allows efficient support adding deleting nodes dynamically learning In particular paper presents Location Independent ASOCS LIA model LIT ASOCS Adaptive Algorithm The description LIA gives formal definitions LIA algorithms Because LIA implements basic ASOCS mechanisms definitions provide formal description basic ASOCS mechanisms general addition LIA
Reinforcement learning algorithms often work finding functions satisfy Bellman equation This yields optimal solution prediction Markov chains controlling Markov decision process MDP finite number states actions This approach also frequently applied Markov chains MDPs infinite states We show case Bellman equation may multiple solutions many lead erroneous predictions policies Baird Algorithms conditions presented guarantee single optimal solution Bellman equation
A major issue casebasedsystems retrieving appropriate cases memory solve given problem This implies case indexed appropriately stored memory A casebased system dynamic stores cases reuse needs learn indices new knowledge system designers envision knowledge Irrespective type indexing structural functional hierarchical organization case memory raises two distinct related issues index learning learning indexing vocabulary learning right level generalization In paper show structurebehaviorfunction SBF models help learning structural indices design cases domain physical devices The SBF model design provides functional causal explanation structure design delivers function We describe SBF model design provides vocabulary structural indexing design cases inductive biases index generalization We discuss modelbased learning integrated similaritybased learning uses prior design cases learning level index generalization
We present representational format observed movements The representation temporal structure relating components single complex movement We also present OXBOW unsupervised learning system constructs classes movements Empirical results indicate system builds abstract movement concepts appropriate component structure allowing predict latter portions partially observed movement
Constructive induction divides problem learning inductive hypothesis two intertwined searches onefor best representation space twofor best hypothesis space In datadriven constructive induction DCI learning system searches better representation space analyzing input examples data The presented datadriven constructive induction method combines AQtype learning algorithm two classes representation space improvement operators constructors destructors The implemented system AQDCI experimentally applied GNP prediction problem using World Bank database The results show decision rules learned AQDCI outperformed rules learned original representation space predictive accuracy rule simplicity
We report novel possibility extracting small subset data base contains information necessary solve given classification task using Support Vector Algorithm train three different types handwritten digit classifiers observed types classifiers construct decision surface strongly overlapping small subsets data base This finding opens possibility compressing data bases significantly disposing data important solution given task In addition show theory allows us predict classifier best generalization ability based solely performance training set characteristics learning machines This finding important cases amount available data limited
Physical variables orientation line visual field location body space coded activity levels populations neurons Reconstruction decoding inverse problem physical variables estimated observed neural activity Reconstruction useful first quantifying much information physical variables present population second providing insight brain might use distributed representations solving related computational problems visual object recognition spatial navigation Two classes reconstruction methods namely probabilistic Bayesian methods basis function methods discussed They include important existing methods special cases population vector coding optimal linear estimation template matching As representative example reconstruction problem different methods applied multielectrode spike train data hippocampal place cells freely moving rats The reconstruction accuracy trajectories rats compared different methods Bayesian methods especially accurate continuity constraint enforced best errors within factor two informationtheoretic limit accurate reconstruction comparable intrinsic experimental errors position tracking In addition reconstruction analysis uncovered interesting aspects place cell activity tendency erratic jumps reconstructed trajectory animal stopped running In general theoretical values minimal achievable reconstruction errors quantify accurately physical variable encoded neuronal population sense mean square error regardless method used reading information One related result theoretical accuracy independent width Gaussian tuning function two dimensions Finally reconstruction methods considered paper implemented unified neural network architecture brain could feasibly use solve related problems
The headdirection HD cells found limbic system freely moving rats represent instantaneous head direction animal horizontal plane regardless location animal The internal direction represented cells uses selfmotion information inertially based updating familiar visual landmarks calibration Here model dynamics HD cell ensemble presented The stability localized static activity profile network dynamic shift mechanism explained naturally synaptic weight distribution components even odd symmetry respectively Under symmetric weights symmetric reciprocal connections stable activity profile close known directional tuning curves emerge By adding slight asymmetry weights activity profile shift continuously without disturbances shape shift speed accurately controlled strength oddweight component The generic formulation shift mechanism determined uniquely within current theoretical framework The attractor dynamics system ensures modalityindependence internal representation facilitates correction cumulative error putative localview detectors The model offers specific onedimensional example computational mechanism truly worldcentered representation derived observercentered sensory inputs integrating selfmotion information
We present biasvariance decomposition expected misclassification rate commonly used loss function supervised classification learning The biasvariance decomposition quadratic loss functions well known serves important tool analyzing learning algorithms yet decomposition offered commonly used zeroone misclassification loss functions recent work Kong Dietterich Breiman Their decomposition suffers major shortcomings though eg potentially negative variance decomposition avoids We show practice naive frequencybased estimation decomposition terms biased show correct bias We illustrate decomposition various algorithms datasets UCI repository
The dominant component computational burden solving n n trivial p r b l e w h evolutionary algorithms task measuring fitness individual generation evolving population The advent r p l r e c n f g u r b l e f e l programmable gate arrays FPGAs idea evolvable hardware opens possiblity e b n g individual evolving population hardware purpose accelerating timeconsuming fitness evaluation task This paper demonstrates massive parallelism rapidly r e c n f g u r b l e X l n x X C FPGA exploited accelerate computationally burdensome fitness evaluation task genetic programming The work done Virtual Computing Corporations lowcost HOTS expansion board PC type computers A step sorter evolved two fewer steps sorting network described OConnor Nelson patent sorting networks number steps minimal sorter devised Floyd Knuth subsequent patent
A theoretically justifiable fast finite successive linear approximation algorithm proposed obtaining parsimonious solution corrupted linear system Ax b p corruption p due noise error measurement The proposed linearprogrammingbased algorithm finds solution x parametrically minimizing number nonzero elements x error k Ax b p k Numerical tests signalprocessingbased example indicate proposed method comparable method parametrically minimizes norm solution x error k Ax b p k methods superior orders magnitude solutions obtained least squares well combinatorially choosing optimal solution specific number nonzero elements
In natural visual experience different views object face tend appear close temporal proximity A set simulations presented demonstrate viewpoint invariant representations faces developed visual experience capturing temporal relationships among input patterns The simulations explored interaction temporal smoothing activity signals Hebbian learning Foldiak feedforward system recurrent system The recurrent system generalization Hopfield network lowpass temporal filter unit activities Following training sequences graylevel images faces changed pose multiple views given face fell basin attraction system acquired representations faces approximately viewpoint invariant
Neural networks successfully applied wide range supervised unsupervised learning applications Neuralnetwork methods commonly used datamining tasks however often produce incomprehensible models require long training times In article describe neuralnetwork learning algorithms able produce comprehensible models require excessive training times Specifically discuss two classes approaches data mining neural networks The first type approach often called rule extraction involves extracting symbolic models trained neural networks The second approach directly learn simple easytounderstand networks We argue given current state art neuralnetwork methods deserve place tool boxes datamining specialists
A statistical theory overtraining proposed The analysis treats realizable stochastic neural networks trained KullbackLeibler loss asymptotic case It shown asymptotic gain generalization error small perform early stopping even access optimal stopping time Considering crossvalidation stopping answer question In ratio examples divided training testing sets order obtain optimum performance In nonasymptotic region crossvalidated early stopping always decreases generalization error Our large scale simulations done CM nice agreement analytical findings
Mathematical programming approaches three fundamental problems described feature selection clustering robust representation The feature selection problem considered discriminating two sets recognizing irrelevant redundant features suppressing This creates lean model often generalizes better new unseen data Computational results real data confirm improved generalization leaner models Clustering exemplified unsupervised learning patterns clusters may exist given database useful tool knowledge discovery databases KDD A mathematical programming formulation problem proposed theoretically justifiable computationally implementable finite number steps A resulting kMedian Algorithm utilized discover useful survival curves breast cancer patients medical database Robust representation concerned minimizing trained model degradation applied new problems A novel approach proposed purposely tolerates small error training process order avoid overfitting data may contain errors Examples applications concepts given
Concept learning viewed search space concept descriptions The hypothesis language determines search space In standard inductive learning algorithms structure search space determined generalizationspecialization operators Algorithms perform locally optimal search using hillclimbing andor beamsearch strategy To overcome limitation concept learning viewed stochastic search space concept descriptions The proposed stochastic search method based simulated annealing known successful means solving combinatorial optimization problems The stochastic search method implemented rule learning system ATRIS based compact efficient representation problem appropriate operators structuring search space Furthermore heuristic pruning search space method enables also handling imperfect data The paper introduces stochastic search method describes ATRIS learning algorithm gives results experiments
The increasing availability finelygrained parallel architectures resulted variety evolutionary algorithms EAs population spatially distributed local selection algorithms operate parallel small overlapping neighborhoods The effects design choices regarding particular type local selection algorithm well size shape neighborhood particularly well understood generally tested empirically In paper extend techniques used formally analyze selection methods sequential EAs apply local neighborhood models resulting much clearer understanding effects neighborhood size shape
Qualitative probabilistic reasoning Bayesian network often reveals tradeoffs relationships ambiguous due competing qualitative influences We present two techniques combine qualitative numeric probabilistic reasoning resolve tradeoffs inferring qualitative relationship nodes Bayesian network The first approach incrementally marginalizes nodes contribute ambiguous qualitative relationships The second approach evaluates approximate Bayesian networks bounds probability distributions uses bounds determinate qualitative relationships question This approach also incremental algorithm refines state spaces random variables tighter bounds qualitative relationships resolved Both approaches provide systematic methods tradeoff resolution potentially lower computational cost application purely numeric methods
A simple powerful modification standard Gaussian distribution studied The variables rectified Gaussian constrained nonnegative enabling use nonconvex energy functions Two multimodal examples competitive cooperative distributions illustrate representational power rectified Gaussian Since cooperative distribution represent translations pattern demonstrates potential rectified Gaussian modeling pattern manifolds
This paper appear Neural Computation Abstract We introduce novel fast algorithm Independent Component Analysis used blind source separation feature extraction It shown neural network learning rule transformed txedpoint iteration provides algorithm simple depend userdetned parameters fast converge accurate solution allowed data The algorithm tnds one time nonGaussian independent components regardless probability distributions The computations performed either batch mode semiadaptive manner The convergence algorithm rigorously proven convergence speed shown cubic Some comparisons gradient based algorithms made showing new algorithm usually times faster sometimes giving solution iterations
Barlows seminal work minimal entropy codes unsupervised learning reiterated In particular need transmit probability events put practical neuronal framework detecting suspicious events A variant BCM learning rule presented together mathematical results suggesting optimal minimal entropy coding
Constructive induction divides problem learning inductive hypothesis two intertwined searches onefor best representation space twofor best hypothesis space In datadriven constructive induction DCI learning system searches better representation space analyzing input examples data The presented datadriven constructive induction method combines AQtype learning algorithm two classes representation space improvement operators constructors destructors The implemented system AQDCI experimentally applied GNP prediction problem using World Bank database The results show decision rules learned AQDCI outperformed rules learned original representation space predictive accuracy rule simplicity
Heuristic measures estimating quality attributes mostly assume independence attributes domains strong dependencies attributes performance poor Relief extension ReliefF capable correctly estimating quality attributes classification problems strong dependencies attributes By exploiting local information provided different contexts provide global view We present analysis ReliefF lead us adaptation regression continuous class problems The experiments artificial realworld data sets show Regressional ReliefF correctly estimates quality attributes various conditions used nonmyopic learning regression trees Regressional ReliefF ReliefF provide unified view estimating attribute quality regression classification
A new research area Inductive Logic Programming presently emerging While inheriting various positive characteristics parent subjects Logic Programming Machine Learning hoped new area overcome many limitations forebears The background present developments within area discussed various goals aspirations increasing body researchers identified Inductive Logic Programming needs based sound principles Logic Statistics On side statistical justification hypotheses discuss possible relationship Algorithmic Complexity theory ProbablyApproximatelyCorrect PAC Learning In terms logic provide unifying framework Muggleton Buntines Inverse Resolution IR Plotkins Relative Least General Generalisation RLGG rederiving RLGG terms IR This leads discussion feasibility extending RLGG framework allow invention new predicates previously discussed within context IR
Bayesian methods applicable complex modeling tasks In review principles Bayesian inference presented applied neural network models Several approximate implementations discussed advantages conventional frequentist model training selection outlined It argued Bayesian methods preferable traditional approaches although empirical evidence still sparse
This paper presents efficient algorithm learning Bayesian belief networks databases The algorithm takes database input constructs belief network structure output The construction process based computation mutual information attribute pairs Given data set large enough algorithm generate belief network close underlying model time enjoys time When data set normal DAGFaithful see Section probability distribution algorithm guarantees structure perfect map Pearl underlying dependency model generated To evaluate algorithm present experimental results three versions wellknown ALARM network database attributes records The results show algorithm accurate efficient The proof correctness analysis complexity O N conditional independence CI tests
This paper presents efficient algorithm constructing Bayesian belief networks databases The algorithm takes database attributes ordering ie causal attributes attribute appear earlier order input constructs belief network structure output The construction process based computation mutual information attribute pairs Given data set large enough DAGIsomorphic probability distribution algorithm guarantees perfect map underlying dependency tests To evaluate algorithm present experimental results three versions wellknown ALARM network database attributes records The correctness proof analysis computational complexity also presented We also discuss features work relate previous works model generated time enjoys time complexity O N conditional independence CI
A novel method regression recently proposed V Vapnik et al The technique called Support Vector Machine SVM well founded mathematical point view seems provide new insight function approximation We implemented SVM tested data base chaotic time series used compare performances different approximation techniques including polynomial rational approximation local polynomial techniques Radial Basis Functions Neural Networks The SVM performs better approaches presented We also study particular time series variability performance respect free parameters SVM
The requirement dense interconnect artificial neural network systems led researchers seek highdensity interconnect technologies This paper reports implementation using multichip modules MCMs interconnect medium The specific system described selforganizing parallel dynamic learning model requires dense interconnect technology effective implementation requirement fulfilled exploiting MCM technology The ideas presented paper regarding MCM implementation artificial neural networks versatile adapted apply neural network connectionist models
When specializing recursive predicate order exclude set negative examples without excluding set positive examples may possible specialize remove clauses refutation negative example without excluding positive exam ples A previously proposed solution problem apply program transformation order obtain nonrecursive target predicates recursive ones However application method prevents recursive specializations found In work present algorithm spectre ii limited specializing nonrecursive predicates The key idea upon algorithm based enough specialize remove clauses refutations negative examples order obtain correct specializations sometimes necessary specialize clauses appear refutations positive examples In contrast predecessor spectre new algorithm limited specializing clauses defining one predicate may specialize clauses defining multiple predicates Furthermore positive negative examples longer required instances predicate It proven algorithm produces correct specialization positive examples logical consequences original program finite number derivations positive negative examples positive negative examples sequence input clauses refutations
program wrt positive negative examples viewed problem pruning SLDtree refutations negative examples refutations positive examples excluded It shown actual pruning performed applying unfolding clause removal The algorithm spectre presented based idea The input algorithm besides logic program positive negative examples computation rule determines shape SLDtree pruned It shown generality resulting specialization dependent computation rule experimental results presented using three different computation rules The experiments indicate computation rule formulated number applications unfolding kept low possible The algorithm uses divideandconquer method also compared covering algorithm The experiments show higher predictive accuracy achieved focus discriminating positive negative examples rather achieving high coverage positive examples
Cognitive mapping qualitative decision modeling technique developed twenty years ago political scientists continues see occasional use social science decisionaiding applications In paper I show cognitive maps viewed context recent formalisms qualitative decision modeling latter provide firm semantic foundation facilitate development powerful inference procedures well extensions expressiveness models sort
Casebased reasoning systems traditionally used perform highlevel reasoning problem domains adequately described using discrete symbolic representations However many realworld problem domains autonomous robotic navigation better characterized using continuous representations Such problem domains also require continuous performance continuous sensorimotor interaction environment continuous adaptation learning performance task We introduce new method continuous casebased reasoning discuss applied dynamic selection modification acquisition robot behaviors autonomous navigation systems We conclude general discussion casebased reasoning issues addressed work
The EastWest Challenge title second international competition machine learning programs organized Fall Donald Michie Stephen Muggleton David Page Ashwin Srinivasan Oxford University The goal competition solve TRAINS problems discover simplest classification rules trainlike structured objects The rule complexity judged Prolog program counted number various components rule expressed Prolog Horn clauses There entries several countries submitted competition The GMU teams entry generated three members AQ family learning programs AQDT INDUCE AQHCI The paper analyses results obtained programs compares obtained learning programs It also presents ideas research inspired competition One ideas challenge machine learning community develop measure knowledge complexity would adequately capture cognitive complexity knowledge A preliminary measure cognitive complexity called Ccomplexity different Prologcomplexity Pcomplexity used competition briefly discussed The authors thank Professors Donald Michie Steve Muggleton David Page Ashwin Srinivasan organizing WestEast Challenge competition machine learning programs provided us stimulating challenge learning programs inspired new ideas improving The authors also thank Nabil Allkharouf Ali Hadjarian help suggestions efforts solve problems posed competition This research conducted Center Machine Learning Inference George Mason University The Centers research supported part Advanced Research Projects Agency Grant No NJ administered Office Naval Research Grant No FJ administered Air Force Office Scientific Research part Office Naval Research Grant No NJ part National Science Foundation Grants No IRI CDA DMI
Previous algorithms recovery Bayesianbelief network structures data either highly dependent conditional independence CI tests required ordering nodes supplied user We present algorithm integrates two approaches CI tests used generate ordering nodes database used recover underlying Bayesian network structure using non CI test based method Results evaluation algorithm number databases eg ALARM LED SOYBEAN presented We also discuss algorithm performance issues open problems
We propose new criterion model selection prediction problems The covariance inflation criterion adjusts training error average covariance predictions responses prediction rule applied permuted versions dataset This criterion applied general prediction problems example regression classification general prediction rules example stepwise regression treebased models neural nets As byproduct obtain measure effective number parameters used adaptive procedure We relate covariance inflation criterion model selection procedures illustrate use regression classification problems We also revisit conditional bootstrap approach model selection
This paper introduces magnetic neural gas MNG algorithm extends unsupervised competitive learning class information improve positioning radial basis functions The basic idea MNG discover heterogeneous clusters ie clusters data different classes migrate additional neurons towards The discovery effected heterogeneity coefficient associated neuron migration guided introducing kind magnetic effect The performance MNG tested number data sets including thyroid data set Results demonstrate promise
This research supported National Science Foundation Fellowship awarded Dario Salvucci Office Naval Research grant N awarded John Anderson The views conclusions contained document authors interpreted representing official policies either expressed implied National Science Foundation Office Naval Research United States government
Efficient algorithms developed estimating model parameters measured data even presence gross errors In addition point estimates parameters however assessments uncertainty needed Linear approximations provide standard errors misleading applied models substantially nonlinear To overcome difficulty profiling methods developed case regressor variables error free In paper extend profiling methods ErrorinVariableMeasurement EVM models We use Laplaces method integrate incidental parameters associated measurement errors apply profiling methods obtain approximate confidence contours parameters This approach computationally efficient requiring function evaluations applied large scale problems It useful certain measurement errors eg input variables relatively small small ignored
A novel architecture set learning rules cortical selforganization proposed The model based idea multiple information channels modulate one anothers plasticity Features learned bottomup information sources thus influenced learned contextual pathways vice versa A maximum likelihood cost function allows scheme implemented biologically feasible hierarchical neural circuit In simulations model first demonstrate utility temporal context modulating plasticity The model learns representation categorizes peoples faces according identity independent viewpoint taking advantage temporal continuity image sequences In second set simulations add plasticity contextual stream explore variations architecture In case model learns twotiered representation starting coarse viewbased clustering proceeding finer clustering specific stimulus features This model provides tenable account people may perform D object recognition hierarchical bottomup fashion
The boosting algorithm AdaBoost developed Freund Schapire exhibited outstanding performance several benchmark problems using C weak algorithm boosted Like ensemble learning approaches AdaBoost constructs composite hypothesis voting many individual hypotheses In practice large amount memory required store hypotheses make ensemble methods hard deploy applications This paper shows selecting subset hypotheses possible obtain nearly levels performance entire set The results also provide insight behavior AdaBoost
Infusion GABA agonist Reiter Stryker infusion NMDA receptor antagonist Bear et al primary visual cortex kittens monocular deprivation shifts ocular dominance toward closed eye cortical region near infusion site This reverse ocular dominance shift previously modeled variants covariance synaptic plasticity rule Bear et al Clothiaux et al Miller et al Reiter Stryker Kasamatsu et al showed infusion NMDA receptor antagonist adult cat primary visual cortex changes ocular dominance distribution reduces binocularity reduces orientation direction selectivity This paper presents novel account effects pharmacological treatments based EXIN synaptic plasticity rules Marshall include instar afferent excitatory outstar lateral inhibitory rule Functionally EXIN plasticity rules enhance efficiency discrimination contextsensitivity neural networks representation perceptual patterns Marshall Marshall Gupta The EXIN model decreases lateral inhibition neurons outside infusion site control regions neurons inside infusion region monocular deprivation In model plasticity afferent pathways neurons affected pharmacological treatments assumed blocked opposed previous models Bear et al Miller et al Reiter Stryker afferent pathways open eye neurons infusion region weakened The proposed model consistent results suggesting longterm plasticity blocked NMDA antagonists postsynaptic hyperpolarization Bear et al Dudek Bear Goda Stevens Kirkwood et al Since role plasticity lateral inhibitory pathways producing cortical plasticity received much attention several predictions made based EXIN lateral inhibitory plasticity rule
We present two algorithms use membership equivalence queries exactly identify concepts given union discretized axisparallel boxes ddimensional discretized Euclidean space coordinate n discrete values The first algorithm receives sd counterexamples uses time membership queries polynomial log n constant Further equivalence queries made formulated union Osd log axisparallel boxes Next introduce new complexity measure better captures complexity union boxes simply number boxes dimensions Our new measure number segments target polyhedron segment maximum portion one sides polyhedron lies entirely inside entirely outside halfspaces defining polyhedron We present improvement first algorithm uses time queries polynomial log n The hypothesis class used decision trees height sd Further show time queries used algorithm polynomial log n constant thus generalizing exact learnability DNF formulas constant number terms In fact single algorithm efficient either constant
Approaches combining genetic algorithms neural networks received great deal attention recent years As result much work reported two major areas neural network design training topology optimization This paper focuses key issues associated problem pruning multilayer perceptron using genetic algorithms simulated annealing The study presented considers number aspects associated network training may alter behavior stochastic topology optimizer Enhancements discussed improve topology searches Simulation results two mentioned stochastic optimization methods applied nonlinear system identification presented compared simple random search
Local belief propagation rules sort proposed Pearl guaranteed converge optimal beliefs singly connected networks Recently number researchers empirically demonstrated good performance algorithms networks loops theoretical understanding performance yet achieved Here lay foundation understanding belief propagation networks loops For networks single loop derive analytical relationship steady state beliefs loopy network true posterior probability Using relationship show category networks MAP estimate obtained belief update belief revision proven optimal although beliefs incorrect We show nodes use local information messages receive order correct steady state beliefs Furthermore prove networks single loop MAP estimate obtained belief revision convergence guaranteed give globally optimal sequence states The result independent length cycle size state space For networks multiple loops introduce concept balanced network show simulation results comparing belief revision update networks We show Turbo code structure balanced present simulations toy Turbo code problem indicating decoding obtained belief revision convergence significantly likely correct This report describes research done Center Biological Computational Learning Department Brain Cognitive Sciences Massachusetts Institute Technology Support Center provided part grant National Science Foundation contract ASC YW also supported NEI R EY E H Adelson
Previous work showed combination Genetic Algorithm using order permutation chromosome combined hand coded Greedy Optimizers readily produce optimal schedule four node test problem Langdon Following GA used find low cost schedules South Wales region UK high voltage power network This paper describes evolution best known schedule base South Wales problem using Genetic Programming starting hand coded heuris tics used GA
Search mechanisms artificial intelligence combine two elements representation determines search space search mechanism actually explores space Unfortunately many searches may explore redundant andor invalid solutions Genetic programming refers class evolutionary algorithms based genetic algorithms utilizing parameterized representation form trees These algorithms perform searches based simulation nature They face problems redundantinvalid subspaces These problems recently addressed systematic manner This paper presents methodology devised public domain genetic programming tool lilgp This methodology uses data typing semantic information constrain representation space valid possibly unique solutions explored The user enters problemspecific constraints transformed normal set This set checked feasibility subsequently used limit space explored The constraints determine valid possibly unique space Moreover also used exclude subspaces user considers uninteresting using problemspecific knowledge A simple example followed thoroughly illustrate constraint language transformations normal set Experiments boolean multiplexer illustrate practical applications method limit redundant space exploration utilizing problemspecific knowledge fl Supported grant NASAJSC NAG
Knowledge acquisition difficult errorprone timeconsuming task The task automatically improving existing knowledge base using learning methods addressed class systems performing theory refinement This paper presents system Forte FirstOrder Revision Theories Examples refines firstorder Hornclause theories integrating variety different revision techniques coherent whole Forte uses techniques within hillclimbing framework guided global heuristic It identifies possible errors theory calls library operators develop possible revisions The best revision implemented process repeats revisions possible Operators drawn variety sources including propositional theory refinement firstorder induction inverse resolution Forte demonstrated several domains including logic programming qualitative modelling
The problem sequence categorization generalize corpus labeled sequences procedures accurately labeling future unlabeled sequences The choice representation sequences major impact task absence background knowledge good representation often known straightforward representations often far optimal We propose feature generation method called FGEN creates Boolean features check presence absence heuristically selected collections subsequences We show empirically representation computed FGEN improves accuracy two commonly used learning systems C Ripper new features added existing representations sequence data We show superiority FGEN across range tasks selected three domains DNA sequences Unix command sequences English text
Genetic algorithms one example use random element within algorithm combinatorial optimization We consider application genetic algorithm particular problem Assembly Line Balancing Problem A general description genetic algorithms given specialized use testbed problems discussed We carry extensive computational testing find appropriate values various parameters associated genetic algorithm These experiments underscore importance correct choice scaling parameter mutation rate ensure good performance genetic algorithm We also describe parallel implementation genetic algorithm give comparisons parallel serial implementations Both versions algorithm shown effective producing good solutions problems type appropriately chosen parameters
This paper presents application CaseBased Reasoning methods KOSIMO data base international conflicts A CaseBased Reasoning tool VIECBR deveolped used classification various outcome variables like political military territorial outcome solution modalities conflict intensity In addition case retrieval algorithms presented interactive usermodifiable tool intelli gently searching conflict data base precedent cases
In order learn behaviour casebased reasoners learning systems formalise simple casebased learner PAC learning algorithm using casebased representation hCB We first consider naive casebased learning algorithm CB H learns collecting available cases casebase calculates similarity counting number features two problem descriptions agree We present results concerning consistency learning algorithm give partial results regarding sample complexity We able characterise CB H weak general learning algorithm We consider sample complexity casebased learning reduced specific classes target concept application inductive bias prior knowledge class target concepts Following recent work demonstrating casebased learning improved choosing similarity measure appropriate concept learnt define second casebased learning algorithm CB learns using best possible similarity measure might inferred chosen target concept While CB executable learning strategy since chosen similarity measure defined terms priori knowledge actual target concept allows us assess limit maximum possible contribution approach casebased learning Also addition illustrating role inductive bias definition CB simplifies general problem establishing functions might represented form hCB Reasoning casebased representation special case therefore little straightforward general case CB H allowing substantial results regarding representable functions sample complexity presented CB In assessing results forced conclude casebased learning best approach learning chosen concept space space monomial functions We discuss however study demonstrated context casebased learning operation concepts well known machine learning inductive bias tradeoff computational complexity sample complexity
In past years evolutionary computation landscape rapidly changing result increased levels interaction various research groups injection new ideas challenge old tenets The effect simultaneously exciting invigorating annoying bewildering oldtimers well newcomers field Emerging activity beginnings structure common themes agreement important open issues We attempt summarize emergent properties paper
We quantify experimentally analytically performance memorybased reasoning MBR algorithms To start gaining insight capabilities MBR algorithms compare MBR algorithm using value difference metric popular Bayesian classifier These two approaches similar make certain independence assumptions data However whereas MBR uses specific cases perform classification Bayesian methods summarize data probabilistically We demonstrate particular MBR system called Pebls works comparatively well wide range domains using real artificial data With respect artificial data consider distributions concept classes separated functional discriminants well timeseries data generated Markov models varying complexity Finally show formally Pebls learn limit natural concept classes Bayesian classifier learn attain perfect accuracy whenever
The Knearestneighbor decision rule assigns object unknown class plurality class among K labeled training objects closest Closeness usually deflned terms metric distance Euclidean space input measurement variables axes The metric chosen deflne distance strongly efiect performance An optimal choice depends problem hand characterized respective class distributions input measurement space within given problem location unknown object space In paper new types Knearestneighbor procedures described estimate local relevance input variable linear combinations individual point classifled This information used separately customize metric used deflne distance object flnding nearest neighbors These procedures hybrid regular Knearestneighbor methods treestructured recursive partitioning techniques popular statistics machine learning
Seismic data interpretation problems typically solved using computationally intensive local search methods often result inferior solutions Here traditional hybrid genetic algorithm compared different staged hybrid genetic algorithms geophysical imaging static corrections problem The traditional hybrid genetic algorithm used applied local search every offspring produced genetic search The staged hybrid genetic algorithms designed temporally separate local genetic search components distinct phases minimize interference two search methods The results show staged hybrid genetic algorithms produce higher quality solutions using significantly less computational time problem
We describe immune system model based universe binary strings The model directed understanding pattern recognition processes learning take place individual species levels immune system The genetic algorithm GA central component model In paper study behavior GA two pattern recognition problems relevant natural immune systems Finally compare model explicit fitness sharing techniques genetic algorithms show model implements form implicit fitness sharing
We present method accurate representation highdimensional unknown functions random samples drawn input space The method builds representations function recursively splitting input space smaller subspaces subspaces linear approximation computed The representations function levels ie depths tree retained learning process good generalisation available well accurate representations subareas Therefore fast accurate learning combined method
Several authors made link hidden Markov models time series energybased models Luttrell Williams Saul Jordan Saul Jordan discuss linear Boltzmann chain model statestate transition energies A ii going state state symbol emission energies B ij probability entire state fi l j l g L Whilst HMM written linear Boltzmann chain setting expA ii ii expB ij b ij exp linear Boltzmann chains represented HMMs Saul Jordan However difference two models minimal To precise final hidden state L linear Boltzmann chain constrained particular end state distribution sequences identical hidden Markov model
We present coevolutionary approach learning sequential decision rules appears number advantages noncoevolutionary approaches The coevolutionary approach encourages formation stable niches representing simpler subbehaviors The evolutionary direction subbehavior controlled independently providing alternative evolving complex behavior using intermediate training steps Results presented showing significant learning rate speedup noncoevolutionary approach simulated robot domain In addition results suggest coevolutionary approach may lead emer gent problem decompositions
Appropriate bias widely viewed key efficient learning generalization I present new algorithm Incremental DeltaBarDelta IDBD algorithm learning appropriate biases based previous learning experience The IDBD algorithm developed case simple linear learning systemthe LMS delta rule separate learningrate parameter input The IDBD algorithm adjusts learningrate parameters important form bias system Because bias approach adapted based previous learning experience appropriate testbeds drifting nonstationary learning tasks For particular tasks type I show IDBD algorithm performs better ordinary LMS fact finds optimal learning rates The IDBD algorithm extends improves prior work Jacobs fully incremental single free parameter This paper also extends previous work presenting derivation IDBD algorithm gradient descent space learningrate parameters Finally I offer novel interpretation IDBD algorithm incremental form holdoneout cross validation
Neural network pruning methods level individual network parameters eg connection weights improve generalization An open problem pruning methods known today OBD OBS autoprune epsiprune selection number parameters removed pruning step pruning strength This paper presents pruning method lprune automatically adapts pruning strength evolution weights loss generalization training The method requires algorithm parameter adjustment user The results extensive experimentation indicate lprune often superior autoprune superior OBD diagnosis tasks unless severe pruning early training process required Results statistical significance tests comparing autoprune new method lprune well backpropagation early stopping given different problems
ICSIM connectionist net simulator developed ICSI written Sather It objectoriented meet requirements flexibility reuse homogeneous structured connectionist nets allow user encapsulate efficient customized implementations perhaps running dedicated hardware Nets composed combining offtheshelf library classes necessary specializing behaviour General user interface classes allow uniform customized graphic presentation nets modeled
In experiencebased casebased reasoning new problems solved retrieving adapting solutions similar problems encountered past An important issue experiencebased reasoning identify different types knowledge reasoning useful different classes caseadaptation tasks In paper examine class nonroutine caseadaptation tasks involve patterned insertions new elements old solutions We describe modelbased method solving task context design physical devices The method uses knowledge generic teleological mechanisms GTMs cascading Old designs adapted meet new functional specifications accessing instantiating appropriate GTM The Kritik system evaluates computational feasibility sufficiency method design adaptation
The utility problem learning systems occurs knowledge learned attempt improve systems performance degrades performance instead We present methodology analysis utility problems uses computational models problem solving systems isolate root causes utility problem detect threshold conditions problem arise design strategies eliminate We present models casebased reasoning controlrule learning systems compare performance respect swamping utility problem Our analysis suggests casebased reasoning systems resistant utility problem controlrule learning systems
We present model similaritybased retrieval attempts capture three psychological phenomena people extremely good judging similarity analogy given items compare Superficial remindings much frequent structural remindings People sometimes experience use purely structural analogical remindings Our model called MACFAC many called chosen consists two stages The first stage MAC uses computationally cheap nonstructural matcher filter candidates pool memory items That redundantly encode structured representations content vectors whose dot product yields estimate well corresponding structural representations match The second stage FAC uses SME compute true structural match probe output first stage MACFAC fully implemented show capable modeling patterns access found psychological data
We present algorithm online learning linear functions optimal within constant factor respect bounds sum squared errors worst case sequence trials The bounds logarithmic number variables Furthermore algorithm shown optimally robust respect noise data within constant factor Key words Machine learning computational learning theory online learning linear functions worstcase loss bounds adaptive filter theory Subject classifications T
A fundamental issue casebased reasoning similarity assessment determining similarities differences new retrieved cases Many methods developed comparing input case descriptions cases already memory However success methods depends input case description sufficiently complete reflect important features new situation assured In casebased explanation anomalous events story understanding anomaly arises current situation incompletely understood consequently similarity assessment based matches known current features old cases likely fail gaps current cases description Our solution problem gaps new cases description approach call constructive similarity assessment Constructive similarity assessment treats similarity assessment simple comparison fixed new old cases process deciding types features investigated new situation features borne knowledge added description current case Constructive similarity assessment merely compare new cases old using prior cases guide dynamically carves augmented descriptions new cases memory
Much recent research modeling memory processes focused identifying useful indices retrieval strategies support particular memory tasks Another important question concerning memory processes however retrieval criteria learned This paper examines issues involved modeling learning memory search strategies It discusses general requirements appropriate strategy learning presents model memory search strategy learning applied problem retrieving relevant information adapting cases casebased reasoning It discusses implementation model based lessons learned implementation points towards issues directions refining model
The problem approximating probability distribution occurs frequently many areas applied mathematics including statistics communication theory machine learning theoretical analysis complex systems neural networks Saul Jordan recently proposed powerful method efficiently approximating probability distributions known structured variational approximations In structured variational approximations exact algorithms probability computation tractable substructures combined variational methods handle interactions substructures make system whole intractable In note I present mathematical result simplify derivation struc tured variational approximations exponential family distributions
This paper presents ASOCS adaptive selforganizing concurrent system model massively parallel processing incrementally defined rule systems areas adaptive logic robotics logical inference dynamic control An ASOCS adaptive network composed many simple computing elements operating asynchronously parallel This paper focuses adaptive algorithm AA details architecture learning algorithm It advantages previous ASOCS models simplicity implementability cost An ASOCS operate either data processing mode learning mode During data processing mode ASOCS acts parallel hardware circuit In learning mode rules expressed boolean conjunctions incrementally presented ASOCS All ASOCS learning algorithms incorporate new rule distributed fashion short bounded time
This paper describes novel search algorithm called dynamic hill climbing borrows ideas genetic algorithms hill climbing techniques Unlike genetic hill climbing algorithms dynamic hill climbing ability dynamically change coordinate frame course optimization Furthermore algorithm moves coarsegrained search finegrained search function space changing mutation rate uses diversitybased distance metric ensure searches new regions space Dynamic hill climbing empirically compared traditional genetic algorithm using De Jongs wellknown five function test suite shown vastly surpass performance genetic algorithm often finding better solutions using many function evaluations
Autonomous vehicles likely require sophisticated software controllers maintain vehicle performance presence vehicle faults The test evaluation complex software controllers expected challenging task The goal e ffort apply machine learning techniques field arti ficial intelligence general problem evaluating intelligent controller autonomous vehicle The approach involves subjecting controller adaptively chosen set fault scenarios within vehicle simulator searching combinations faults produce noteworthy performance vehicle controller The search employs genetic algorithm We illustrate approach evaluating performance subsumptionbased controller autonomous vehicle The preliminary evidence suggests approach e ffective alternative manual testing sophisticated software controllers
Speedup learning seeks improve efficiency searchbased problem solvers In paper propose new theoretical model speedup learning captures systems improve problem solving performance solving usergiven set problems We also use model motivate notion batch problem solving argue congenial learning sequential problem solving Our theoretical results applicable serially decomposable domains We empirically validate results domain Eight Puzzle
Nonparametric density estimation problem approximating values probability density function given samples associated distribution Nonparametric estimation finds applications discriminant analysis cluster analysis flow calculations based Smoothed Particle Hydrodynamics Usual estimators make use kernel functions require order n arithmetic operations evaluate density n sample points We describe sequence special weight functions requires almost linear number operations n computation
In paper consider learning firstorder Horn programs entailment In particular show subclass firstorder acyclic Horn programs constant arity exactly learnable equivalence entailment membership queries provided allows polynomialtime subsumption procedure satisfies closure conditions One consequence firstorder acyclic determinate Horn programs constant arity exactly learnable equiv alence entailment membership queries
A strategy using Genetic Algorithms GAs solve NPcomplete problems presented The key aspect approach taken exploit observation although NPcomplete problems equally difficult general computational sense much better GA representations others leading much successful use GAs NPcomplete problems others Since NPcomplete problem mapped one polynomial time strategy described consists identifying canonical NPcomplete problem GAs work well solving NPcomplete problems indirectly mapping onto canonical problem Initial empirical results presented support claim Boolean Satisfiability Problem SAT GAeffective canonical problem NPcomplete problems poor GA representations solved efficiently mapping first onto SAT problems
Fully cooperative multiagent systemsthose agents share joint utility modelis special interest AI A key problem ensuring actions individual agents coordinated especially settings agents autonomous decision makers We investigate approaches learning coordinated strategies stochastic domains agents actions directly observable others Much recent work game theory adopted Bayesian learning perspective general problem equilibrium selection tends assume actions observed We discuss special problems arise actions observable including effects rates convergence effect action failure probabilities asymmetries We also use likelihood estimates means generalizing fictitious play learning models setting Finally propose use maximum likelihood means removing strategies consideration aim convergence conventional equilibrium point learning deliberation cease
Humans appear often solve problems new domain transferring expertise familiar domain However making crossdomain analogies hard often requires abstractions common source target domains Recent work casebased design suggests generic mechanisms one type abstractions used designers However one important yet unexplored issue generic mechanisms come We hypothesize acquired incrementally problemsolving experiences familiar domains generalization patterns regularity Three important issues generalization experiences generalize experience far generalize methods use In paper show mental models familiar domain provide content together problemsolving context learning occurs also provide constraints learning generic mechanisms design experiences In particular show modelbased learning method integrated similaritybased learning addresses issues generalization experiences
The optimization single bit string means iterated mutation selection best Genetic Algorithm discussed respect three simple fitness functions The counting ones problem standard binary encoded integer Gray coded integer optimization problem A mutation rate schedule optimal respect success probability mutation presented objective functions turns standard binary code hamper search process even case unimodal objective functions While normally mutation rate l l denotes bit string length recommendable results indicate variation mutation rate useful cases fitness function multimodal pseudoboolean function multimodality may caused objective function well encoding mechanism
Genetic Algorithms used learn navigation collision avoidance behaviors robots The learning performed simulation resulting behaviors used control The approach learning behaviors robots described reflects particular methodology learning via simulation model The motivation making mistakes real systems may costly dangerous In addition time constraints might limit number experiences learning real world many cases simulation model made run faster real time Since learning may require experimenting behaviors might occasionally produce unacceptable results applied real world might require much time real environment assume hypothetical behaviors evaluated simulation model offline system As illustrated Figure current best behavior placed real online system learning continues offline system The learning algorithm designed learn useful behaviors simulations limited fidelity The expectation behaviors learned simulations useful realworld environments Previous studies illustrated knowledge learned simulation robust might applicable real world simulation general ie noise varied conditions etc real world environment Where possible important identify differences simulation world note effect upon learning process The research reported continues examine hypothesis The next section briefly explains learning algorithm gives pointers extensive documentation found After actual robot described Then describe simulation robot The task actual robot
Conventional Intelligent Tutoring Systems ITS acknowledge uncertainty students knowledge Yet outcome teaching intervention exact state students knowledge uncertain In recent years researchers made startling progress management uncertainty knowledgebased systems Building developments describe ITS architecture explicitly models uncertainty This facilitate accurate student modeling provide ITSs learn
Satisfiability SAT refers task finding truth assignment makes arbitrary boolean expression true This paper compares neural network algorithm NNSAT GSAT greedy algorithm solving satisfiability problems GSAT solve problem instances difficult traditional satisfiability algorithms Results suggest NNSAT scales better number variables increase solving least many hard SAT problems
In last years several researchers within Artificial Life Mobile Robotics community used Artificial Neural Networks Explicitly viewing Neural Networks Artificial Life perspective number consequences make research call Artificial Life Neural Networks ALNNs rather different traditional connectionist research The aim paper make differences ALNNs classical neural networks explicit
The recognition D objects sequences D views modeled family selforganizing neural architectures called VIEWNET use View Information Encoded With NETworks VIEWNET incorporates preprocessor generates compressed D invariant representation image supervised incremental learning system Fuzzy ARTMAP classifies preprocessed representations D view categories whose outputs combined D invariant object categories working memory makes D object prediction accumulating evidence time D object category nodes multiple D views experienced VIEWNET benchmarked MIT Lincoln Laboratory database x D views aircraft including small frontal views without additive noise A recognition rate achieved one D view correct three D views The properties D view D object category nodes compared cells monkey inferotemporal cortex
Recent interest come deriving various neural network architectures modelling timedependent signals A number algorithms published multilayer perceptrons synapses described finite impulse response FIR infinite impulse response IIR filters latter case also known Locally Recurrent Globally Feedforward Networks The derivations algorithms used different approaches calculating gradients note present short unifying account different algorithms compare FIR case derivation performance New algorithms subsequently presented Simulation results performed benchmark algorithms In note results compared MackeyGlass chaotic time series number methods including standard multilayer perceptron local approximation method
This paper presents methodology estimate optimal number learning samples number hidden units needed obtain desired accuracy function approximation feedforward network The representation error generalization error components total approximation error analyzed approximation accuracy feedforward network investigated function number hidden units number learning samples Based asymptotical behavior approximation error asymptotical model error function AMEF introduced parameters determined experimentally An alternative model error function include theoretical results general bounds approximation also analyzed In combination knowledge computational complexity learning rule optimal learning set size number hidden units found resulting minimum computation time given desired precision approximation This approach applied optimize learning camerarobot mapping visually guided robot arm complex logarithm function approximation
We propose methodology Bayesian model determination decomposable graphical Gaussian models To achieve aim consider hyper inverse Wishart prior distribution concentration matrix given graph To ensure compatibility across models prior distributions obtained marginalisation prior conditional complete graph We explore alternative structures hyperparameters latter consequences model Model determination carried implementing reversible jump MCMC sampler In particular dimensionchanging move propose involves adding dropping edge graph We characterise set moves preserve decomposability graph giving fast algorithm maintaining junction tree representation graph sweep As state variable propose use incomplete variancecovariance matrix containing elements corresponding element inverse nonzero This allows computations performed locally clique level clear advantage analysis large complex datasets Finally statistical computational performance procedure illustrated means artificial real multidimensional datasets
An essential component opportunistic behavior opportunity recognition recognition conditions facilitate pursuit suspended goal Opportunity recognition special case situation assessment process sizing novel situation The ability recognize opportunities reinstating suspended problem contexts one way goals manifest design crucial creative design In order deal real world opportunity recognition attribute limited inferential power relevant suspended goals We propose goals suspended working memory monitor internal hidden representations currently recognized objects A suspended goal satisfied current internal representation suspended goal match We propose computational model working memory compare relevant theories opportunistic planning This working memory model implemented part IMPROVISER system
Technical Report UMIACSTR CSTR Institute Advanced Computer Studies University Maryland College Park MD Abstract One important aspects machine learning paradigm scales according problem size complexity Using task known optimal training error prespecified maximum number training updates investigate convergence backpropagation algorithm respect complexity required function approximation b size network relation size required optimal solution c degree noise training data In general solution found worse function approximated complex b oversized networks result lower training generalization error certain cases c use committee ensemble techniques beneficial level noise training data increased For experiments performed obtain optimal solution case We support observation larger networks produce better training generalization error using face recognition example network many parameters training points generalizes better smaller networks
For many reasons neural networks become popular AI machine learning models Two important aspects machine learning models well model generalizes unseen data well model scales problem complexity Using controlled task known optimal training error investigate convergence backpropagation BP algorithm We find optimal solution typically found Furthermore observe networks larger might expected result lower training generalization error This result supported another real world example We investigate training behavior analyzing weights trained networks excess degrees freedom seen little harm aid convergence contrasting interpolation characteristics multilayer perceptron neural networks MLPs polynomial models overfitting behavior different MLP often biased towards smoother solutions Finally analyze relevant theory outlining reasons significant practical differences These results bring question common beliefs neural network training regarding convergence optimal network size suggest alternate guidelines practical use lower fear excess degrees freedom help direct future work eg methods creation parsimonious solutions importance MLPBP bias possibly worse performance improved training algorithms
This paper presents novel induction algorithm Rulearner induces classification rules using Galois lattice explicit map search space rules The Rulearner system shown compare favorably commonly used symbolic learning methods use heuristics rather explicit map guide search rule space Furthermore learning system shown robust presence noisy data The Rulearner system also capable learning decision lists unordered rule sets allowing comparisons different learning paradigms within algorithmic framework
Keywords CaseBased Reasoning case retrieval case representation This paper deals retrieval useful cases casebased reasoning It focuses questions useful could mean search useful cases organized We present new search algorithm Fish Shrink able search quickly case base even aspects deflne usefulness spontaneously combined query time We compare Fish Shrink algorithms show make implicit closed world assumption We flnally refer realization presented idea context prototype FABELProject The scenery follows Previously collected cases stored large scaled case base An expert describes problem gives aspects requested case similar The similarity measure thus given spontaneously shall used explore case base within short time shall present required number cases make sure none cases similar The question prepare previously collected cases deflne retrieval algorithm able deal sponta neously userdeflned similarity measures
The parallel genetic algorithm PGA uses two major modifications compared genetic algorithm Firstly selection mating distributed Individuals live D world Selection mate done individual independently neighborhood Secondly individual may improve fitness lifetime eg local hillclimbing The PGA totally asynchronous running maximal efficiency MIMD parallel computers The search strategy PGA based small number active intelligent individuals whereas GA uses large population passive individuals We investigate PGA deceptive problems traveling salesman problem We outline PGA succesful Abstractly PGA parallel search information exchange individuals If represent optimization problem fitness landscape certain configuration space see PGA tries jump two local minima third still better local minima using crossover operator This jump probabilistically successful fitness landscape certain correlation We show correlation traveling salesman problem configuration space analysis The PGA explores implicitly correlation
The dominant theme casebased research recent ML conferences classifying cases represented feature vectors However useful tasks targeted representations often preferable We review recent literature casebased learning focusing alternative performance tasks expressive case representations We also highlight topics need additional research
Current approaches computational lexicology language technology knowledgebased competenceoriented try abstract away specific formalisms domains applications This results severe complexity acquisition reusability bottlenecks As alternative propose particular performanceoriented approach Natural Language Processing based automatic memorybased learning linguistic lexical tasks The consequences approach computational lexicology discussed application approach number lexical acquisition disambiguation tasks phonology morphology syntax described
Let H function explicitly defined approximable sequence H n n functional estimators In context propose new sequential algorithm optimise asymptotically H using stepwise estimators H n We prove mild conditions almost sure convergence law algorithm
In paper study new informationtheoretically justified approach missing data estimation multivariate categorical data The approach discussed modelbased imputation procedure relative model class ie functional form probability distribution complete data matrix case set multinomial models independence assumptions Based given model class assumption informationtheoretic criterion derived select different complete data matrices Intuitively general criterion called stochastic complexity represents shortest code length needed coding complete data matrix relative model class chosen Using informationtheoretic criteria missing data problem reduced search problem ie finding data completion minimal stochastic complexity In experimental part paper present empirical results approach using two real data sets compare results achived commonly used techniques case deletion imputating sample averages
We present paper new evolutionary procedure solving general optimization problems combines efficiently mechanisms genetic algorithms tabu search In order explore solution space properly interaction phases interspersed periods optimization algorithm An adaptation search principle National Hockey League NHL problem discussed The hybrid method developed paper well suited Open Shop Scheduling problems OSSP The results obtained appear quite satisfactory
Behavioural observations often described sequence symbols drawn finite alphabet However inductive inference strings automated technique produce models data nontrivial task This paper considers modelling behavioural data using probabilistic finite state automata PFSAs There number informationtheoretic techniques evaluating possible hypotheses The measure used paper Minimum Message Length MML Wallace Although attempts made construct PFSA models incremental addition substrings using heuristic rules MML give lowest information cost resultant models shown globally optimal Fogels Evolutionary Programming produce globally optimal PFSA models evolving data structures arbitrary complexity without requirement encode PFSA binary strings Genetic Algorithms However evaluation PFSAs evolution process MML PFSA alone possible since symbols consumed partially correct solution It suggested addition cant consume symbol symbol alphabet obviates difficulty The addition null symbol alphabet also permits evolution explanatory models need explain data useful property avoid overfitting noisy data Results given test set optimal pfsa model known set eye glance data derived instrument panel simulator
Technical Report CUEDFINFENGTR We use reversible jump Markov chain Monte Carlo MCMC methods Green address problem model order uncertainty autoregressive AR time series within Bayesian framework Efficient model jumping achieved proposing model space moves full conditional density AR parameters obtained analytically This compared alternative method moves cheaper compute proposals made new parameters move Results presented synthetic audio time series
Learning viewed problem planning series modifications memory We adopt view learning propose applicability casebased planning methodology task planning learn We argue relatively simple finegrained primitive inferential operators needed support flexible planning We show possible obtain benefits casebased reasoning within planning learn framework
V SCBR simple instancebased learning algorithm adjusts weighted similarity measure well collecting cases This paper presents PAC analysis V SCBR motivated PAC learning framework demonstrates two main ideas relevant study instancebased learners Firstly hypothesis spaces learner different target concepts compared predict difficulty target concepts learner Secondly helpful consider constituent parts instancebased learner explore separately many examples needed infer good similarity measure many examples needed case base Applying approaches show V SCBR learns quickly variables representation irrelevant target concept slowly relevant variables The paper relates overall behaviour behaviour constituent parts V SCBR
Partial determinations interesting form dependency attributes relation They generalize functional dependencies allowing exceptions We modify known MDL formula evaluating partial determinations allow use admissible heuristic exhaustive search Furthermore describe efficient preprocessingbased approach handling numerical attributes An empirical investigation tries evaluate viability presented ideas
The induction optimal finite state machine explanation symbol strings known least NPcomplete However satisfactory approximately optimal explanations may found use Evolutionary Programming It shown information theoretic measure finite state machine explanations used fitness function required evaluation candidate explanations search nearoptimal explanation It obvious measure class explanation favoured others search By empirical studies possible gain insight dimensions measure optimising In general probabilistic finite state machines explanations assessed minimum message length estimator minimum number transitions favoured explanations The information measure also favour explanations uneven distributions frequencies transitions node suggesting repeated sequences symbol strings preferred explanation Approximate bounds acceptance explanations length string required induction successful also derived considerations simplest possible random explanations information measure
How evolutionary process interact decentralized distributed system order produce globally coordinated behavior Using genetic algorithm GA evolve cellular automata CAs show evolution spontaneous synchronization one type emergent coordination takes advantage underlying mediums potential form embedded particles The particles typically phase defects synchronous regions designed evolutionary process resolve frustrations global phase We describe detail one typical solution discovered GA delineating discovered synchronization algorithm terms embedded particles interactions We also use particlelevel description analyze evolutionary sequence solution discovered Our results implications understanding emergent collective behavior natural systems automatic programming decentralized spatially extended multiprocessor systems
We prove general bootstrap theorem possibly infinitedimensional Zestimators builds recent infinitedimensional Ztheorem due Van der Vaart Our result extends finitedimensional results type bootstrap due Arcones Gine Lele Newton Raftery We sketch three examples models infinitedimensional parameter spaces fi applicatons general theorem
The prediction survival time recurrence time important learning problem medical domains The Recurrence Surface Approximation RSA method natural effective method predicting recurrence times using censored input data This paper introduces Survival Curve RSA SCRSA extension RSA approach produces accurate predicted rates recurrence maintaining accuracy individual predicted recurrence times The method applied problem breast cancer recurrence using two different datasets
We analyze query committee algorithm method filtering informative queries random stream inputs We show twomember committee algorithm achieves information gain positive lower bound prediction error decreases exponentially number queries We show particular exponential decrease holds query learning perceptrons Keywords selective sampling query learning Bayesian Learning experimental design fl Yoav Freund Room B ATT Bell Laboratories Mountain Ave Murray Hill NJ Telephone
A new method performing nonlinear form Principal Component Analysis proposed By use integral operator kernel functions one efficiently compute principal components highdimensional feature spaces related input space nonlinear map instance space possible pixel products fi images We give derivation method present first experimental results polynomial feature extraction pattern recognition
Modeling techniques developed recently AI uncertain reasoning communities permit significantly flexible specifications probabilistic knowledge Specifically graphical decisionmodeling formalismsbelief networks influence diagrams variantsprovide compact representation probabilistic relationships support inference algorithms automatically exploit dependence structure models These advances brought resurgence interest computational decision systems based normative theories belief preference However graphical decisionmodeling languages still quite limited purposes knowledge representation describe relationships among particular event instances capture general knowledge probabilistic relationships across classes events The inability capture general knowledge serious impediment AI tasks relevant factors decision problem enumerated advance A graphical decision model encodes particular set probabilistic dependencies predefined set decision alternatives specific mathematical form utility function Given properly specified model exist relatively efficient algorithms calculating posterior probabilities optimal decision policies A range similar cases may handled parametric variations original model However structure dependencies set available alternatives form utility function changes situation situation fixed network representation longer adequate An ideal computational decision system would possess general broad knowledge domain would ability reason particular circumstances given decision problem within domain One obvious approachwhich call call knowledgebased model construction KBMCis generate decision model dynamically runtime based problem description information received thus far Model construction consists selection instantiation assembly causal associational relationships broad knowledge base general relationships among domain concepts For example suppose wish develop system recommend appropriate actions maintaining computer network The natural graphical decision model would include chance
Determining conditions given learning algorithm appropriate open problem machine learning Methods selecting learning algorithm given domain met limited success This paper proposes new approach predicting given examples class locating example space choosing best learners region example space make predictions The regions example space defined prediction patterns learners used The learners chosen prediction selected according past performance region This dynamic approach learning algorithm selection compared methods selecting multiple learning algorithms The approach extended weight rather select algorithms according past performance given region Both approaches evaluated set Determining conditions given learning algorithm appropriate open problem machine learning Methods selecting learning algorithm given domain eg Aha Breiman portion domain Brodley Brodley met limited success This paper proposes new approach dynamically selects learning algorithm example locating example space choosing best learners prediction part example space The regions example space formed observed prediction patterns learners used The learners chosen prediction selected according past performance region defined crossvalidation history This paper introduces DS method dynamic selection learning algorithms We call dynamic learning algorithms used classify novel example depends example Preliminary experimentation motivated DW extension DS dynamically weights learners predictions according regional accuracy Further experimentation compares DS DW collection metalearning strategies crossvalidation Breiman various forms stacking Wolpert In phase experiementation metalearners six constituent learners heterogeneous search representation methods eg rule learner CN Clark decision tree learner C Quinlan oblique decision tree learner OC Murthy instancebased learner PEBLS Cost knearest neighbor learner ten domains compared several metalearning strategies
Tw important issues machine learning explored role memory plays acquiring new concepts extent learner take active part acquiring concepts This chapter describes program called Marvin uses concepts learned previously learn new concepts The program forms hypotheses concept learned tests hypotheses asking trainer questions Learning begins trainer shows Marvin example concept learned The program determines objects example belong concepts stored memory A description new concept formed using information obtained memory generalize description training example The generalized description tested program constructs new examples shows trainer asking belong target concept
Adaptation ecological systems environments commonly viewed explicit fitness function defined priori experimenter measured posteriori estimations based population size andor reproductive rates These methods capture role environmental complexity shaping selective pressures control adaptive process Ecological simulations enabled computational tools Latent Energy Environments LEE model allow us characterize closely effects environmental complexity evolution adaptive behaviors LEE described paper Its motivation arises need vary complexity controlled predictable ways without assuming relationship changes adaptive behaviors engender This goal achieved careful characterization environments different forms energy welldefined A genetic algorithm using endogenous fitness local selection used model evolutionary process Individuals population modeled neural networks simple sensorymotor systems variations behaviors related interactions varying environments We outline results three experiments analyze different sources environmental complexity effects collective behaviors evolving populations
In paper investigate efficiency subsumption basic provability relation ILP As D C NPcomplete even restrict linked Horn clauses fix C contain small constant number literals investigate several restrictions D We first adapt notion determinate clauses used ILP show subsumption decidable polynomial time D determinate respect C Secondly adapt notion klocal Horn clauses show subsumption efficiently computable reasonably small k We show results combined give efficient reasoning procedure determinate klocal Horn clauses ILPproblem recently suggested polynomial predictable Cohen simple counting argument We finally outline reduction algorithm essential part every lgg ILPlearning algorithm im proved ideas
BBN Technical Report Abstract Genetic programming powerful method automatically generating computer programs via process natural selection Koza However limitation known closure ie variables constants arguments functions values returned functions must data type To correct deficiency introduce variation genetic programming called strongly typed genetic programming STGP In STGP variables constants arguments returned values data type provision data type value specified beforehand This allows initialization process genetic operators generate syntactically correct parse trees Key concepts STGP generic functions true strongly typed functions rather templates classes functions generic data types analogous To illustrate STGP present four examples involving vectormatrix manipulation list manipulation multidimensional leastsquares regression problem multidimensional Kalman filter list manipulation function NTH list manipulation function MAPCAR
Category algorithms architectures recurrent networks No part paper submitted elsewhere Preference poster Abstract Existing proofs demonstrating computational limitations Recurrent Cascade Correlation RCC Network Fahlman explicitly limit results units sigmoidal hardthreshold transfer functions Giles et al Kremer The proof given shows given finite discrete deterministic transfer function used units RCC network finitestate automata FSA network model matter many units used The proof applies equally well continuous transfer functions finite number fixedpoints sigmoid function
subsumption decidable incomplete approximation logic implication important inductive logic programming theorem proving We show context based elimination possible matches certain superset determinate clauses tested subsumption polynomial time We discuss relation subsumption clique problem showing particular using additional prior knowledge substitution space small fraction search space identified possibly containing globally consistent solutions leads effective pruning rule We present empirical results demonstrating combination approaches provides extreme reduction computational effort
We introduce new algorithm designed learn sparse perceptrons input representations include highorder features Our algorithm based hypothesisboosting method able PAClearn relatively natural class target concepts Moreover algorithm appears work well practice set three problem domains algorithm produces classifiers utilize small numbers features yet exhibit good generalization performance Perhaps importantly algorithm generates concept descriptions easy humans understand
Current inductive machine learning algorithms typically use greedy search limited lookahead This prevents detect significant conditional dependencies attributes describe training objects Instead myopic impurity functions lookahead propose use RELIEFF extension RELIEF developed Kira Rendell heuristic guidance inductive learning algorithms We reimplemented Assistant system top induction decision trees using RELIEFF estimator attributes selection step The algorithm tested several artificial several real world problems results compared well known machine learning algorithms Excellent results artificial data sets two real world problems show advantage presented approach inductive learning
This paper presents new approach hierarchical reinforcement learning based MAXQ decomposition value function The MAXQ decomposition procedural semanticsas subroutine hierarchyand declarative semanticsas representation value function hierarchical policy MAXQ unifies extends previous work hierarchical reinforcement learning Singh Kaelbling Dayan Hinton Conditions MAXQ decomposition represent optimal value function derived The paper defines hierarchical Q learning algorithm proves convergence shows experimentally learn much faster ordinary flat Q learning Finally paper discusses interesting issues arise hierarchical reinforcement learning including hierarchical credit assignment problem nonhierarchical execution MAXQ hierarchy
Causality relates changes structure object effects changes changes properties behavior object This paper analyzes concept causality Genetic Programming GP suggests used adapting control parameters speeding GP search We first analyze effects crossover show weak causality GP representation operators Hierarchical GP approaches based discovery evolution functions amplify phenomenon However selection gradually retains strongly causal changes Causality correlated search space exploitation discussed context explorationexploitation tradeoff The results described argue bottomup GP evolutionary thesis Finally new developments based idea GP architecture evolution Koza discussed causality perspective
Methods voting classification algorithms Bagging AdaBoost shown successful improving accuracy certain classifiers artificial realworld datasets We review algorithms describe large empirical study comparing several variants conjunction decision tree inducer three variants NaiveBayes inducer The purpose study improve understanding algorithms use perturbation reweighting combination techniques affect classification error We provide bias variance decomposition error show different methods variants influence two terms This allowed us determine Bagging reduced variance unstable methods boosting methods AdaBoost Arcx reduced bias variance unstable methods increased variance NaiveBayes stable We observed Arcx behaves differently AdaBoost reweighting used instead resampling indicating fundamental difference Voting variants introduced paper include pruning versus pruning use probabilistic estimates weight perturbations Wagging backfitting data We found Bagging improves probabilistic estimates conjunction nopruning used well data backfit We measure tree sizes show interesting positive correlation increase average tree size AdaBoost trials success reducing error We compare meansquared error voting methods nonvoting methods show voting methods lead large significant reductions meansquared errors Practical problems arise implementing boosting algorithms explored including numerical instabilities underflows We use scatterplots graphically show AdaBoost reweights instances emphasizing hard areas also outliers noise
The longterm goal field creation understanding intelligence Productive research AI practical theoretical benefits notion intelligence precise enough allow cumulative development robust systems general results The concept rational agency long considered leading candidate fulfill role This paper outlines gradual evolution formal conception rationality brings closer informal conception intelligence simultaneously reduces gap theory practice Some directions future research indicated
A solution problem representing compositional structure using distributed representations described The method uses circular convolution associate items represented vectors Arbitrary variable bindings short sequences various lengths frames reduced representations compressed fixed width vector These representations items right used constructing compositional structures The noisy reconstructions given convolution memories cleaned using separate associative memory good reconstructive properties
based Angluins L fl algorithm The algorithm maintains model consistent past examples When new counterexample arrives tries extend model minimal fashion We conducted set experiments random automata represent different strategies generated algorithm tried learn based prefixclosed samples behavior The algorithm managed learn compact models agree samples The size sample small effect size model The experimental results suggest random prefixclosed samples algorithm behaves well However following Angluins result difficulty learning almost uniform complete samples Angluin obvious algorithm solve complexity issue inferring DFA general prefixclosed sample We currently looking classes prefixclosed samples USL behaves well Carmel Markovitch D Carmel S Markovitch The M algorithm Incorporating opponent models adversary search Technical Report CIS report Technion March Carmel Markovitch D Carmel S Markovitch Unsupervised learning finite automata A practical approach Technical Report CIS report Technion March Shoham Tennenholtz Y Shoham M Tennenholtz CoLearning evolution social activity Technical Report STANCSTR Stanford Univrsity Department Computer Science
AA incremental learning algorithm Adaptive SelfOrganizing Concurrent Systems ASOCS ASOCS selforganizing dynamically growing networks computing nodes AA learns discrimination implements knowledge distributed fashion nodes This paper reviews AA perspective convergence generalization A formal proof AA converges arbitrary Boolean instance set given A discussion generalization aspects AA including problem handling inconsistency follows Results simulations realworld data presented They show AA gives promising generalization
The term bias widely usedand different meaningsin fields machine learning statistics This paper clarifies uses term shows measure visualize statistical bias variance learning algorithms Statistical bias variance applied diagnose problems machine learning bias paper shows four examples Finally paper discusses methods reducing bias variance Methods based voting reduce variance paper compares Breimans bagging method tree randomization method voting decision trees Both methods uniformly improve performance data sets Irvine repository Tree randomization yields perfect performance Letter Recognition task A weighted nearest neighbor algorithm based infinite bootstrap also introduced In general decision tree algorithms moderatetohigh variance important implication work variancerather appropriate inappropriate machine learning biasis important cause poor performance decision tree algorithms
We analyze use builtin policies macroactions form domain knowledge improve speed scaling reinforcement learning algorithms Such macroactions often used robotics macrooperators also wellknown aid statespace search AI systems The macroactions consider closedloop policies termination conditions The macroactions chosen level primitive actions Macroactions commit learning agent act particular purposeful way sustained period time Overall macroactions may either accelerate retard learning depending appropriateness macroactions particular task We analyze effect simple example breaking acceleration effect two parts effect macroaction changing exploratory behavior independent learning effect macroaction learning independent effect behavior In example effects significant latter appears larger Finally provide complex gridworld illustration appropriately chosen macroactions accelerate overall learning
We present new approach reinforcement learning policies considered learning process constrained hierarchies partially specified machines This allows use prior knowledge reduce search space provides framework knowledge transferred across problems component solutions recombined solve larger complicated problems Our approach seen providing link reinforcement learning behaviorbased teleoreactive approaches control We present provably convergent algorithms problemsolving learning hierarchical machines demonstrate effectiveness problem several thousand states
When casebased planner retrieving previous case preparation solving new similar problem often aware implicit features new problem situation determine particular case may successfully applied This means cases may retrieved error case may fail improve planners performance Retrieval may incrementally improved detecting explaining failures occur In paper provide definition case failure planner dersnlp derivation replay snlp solves new problems replaying previous plan derivations We provide EBL explanationbased learning techniques detecting constructing reasons failure We also describe organize case library incorporate failure information produced Finally present empirical study demonstrates effectiveness approach improving performance dersnlp
This paper presents new methods training large neural networks phoneme probability estimation An architecture combining timedelay windows recurrent connections used capture important dynamic information speech signal Because number connections fully connected recurrent network grows superlinear number hidden units schemes sparse connection connection pruning explored It found sparsely connected networks outperform fully connected counterparts equal number connections The implementation combined architecture training scheme described detail The networks evaluated hybrid HMMANN system phoneme recognition TIMIT database word recognition WAXHOLM database The achieved phone errorrate standard phoneme set core testset TIMIT database range lowest reported All training simulation software used made freely available author detailed information software training process given Appendix
The error rate decisiontree classification learners often much reduced bagging learning multiple models bootstrap samples database combining uniform voting In paper empirically test two alternative explanations based Bayesian learning theory bagging works approximation optimal procedure Bayesian model averaging appropriate implicit prior bagging works effectively shifts prior appropriate region model space All experimental evidence contradicts first hypothesis confirms second Bagging Breiman simple effective way reduce error rate many classification learning algorithms For example empirical study described reduces error decisiontree learner databases average In bagging procedure given training set size bootstrap replicate constructed taking samples replacement training set Thus new training set size produced original examples may appear On average original examples appear bootstrap sample The learning algorithm applied training set This procedure repeated times resulting models aggregated uniform voting Bagging one several multiple model approaches recently received much attention see example Chan Stolfo Wolpert Other procedures type include boosting Freund Schapire stacking Wolpert
We propose algorithm called query committee committee students trained data set The next query chosen according principle maximal disagreement The algorithm studied two toy models highlow game perceptron learning another perceptron As number queries goes infinity committee algorithm yields asymptotically finite information gain This leads generalization error decreases exponentially number examples This marked contrast learning randomly chosen inputs information gain approaches zero generalization error decreases relatively slow inverse power law We suggest asymptotically finite information gain may important characteristic good query algorithms
Tech Report Department Statistics Open University Walton Hall MK AA UK Tech Report Department Computer Science Monash University Clayton Vic Australia Abstract This paper examines minimum encoding approaches inference Minimum Message Length MML Minimum Description Length MDL This paper written objective providing introduction area statisticians We describe coding techniques data examine techniques applied perform inference model selection
Field suggested neurons line edge selectivities found primary visual cortex cats monkeys form sparse distributed representation natural scenes Barlow reasoned responses emerge unsupervised learning algorithm attempts find factorial code independent visual features We show nonlinear infomax applied ensemble natural scenes produces sets visual filters localised oriented Some filters Gaborlike resemble produced sparsenessmaximisation network Olshausen Field In addition outputs filters independent possible since infomax network able perform Independent Components Analysis ICA We compare resulting ICA filters associated basis functions decorrelating filters produced Principal Components Analysis PCA zerophase whitening filters ZCA The ICA filters sparsely distributed kurtotic outputs natural scenes They also resemble receptive fields simple cells visual cortex suggests neurons form informationtheoretic coordinate system images
Loan applications banks often long requiring applicant provide large amounts data Is necessary Can save applicant frustration bank expense using subset relevant variables To answer question I attempted model current loan approval process particular bank I used several model selection techniques logistic regression including stepwise regression Occams Window Markov Chain Monte Carlo Model Composition Raftery Madigan Hoeting Bayesian Random Searching The resulting models largely agree upon subset onethird original variables fl This paper completed partial fulfillment PhD data analysis requirement
Learning planning representing knowledge multiple levels temporal abstraction key challenges AI In paper develop approach problems based mathematical framework reinforcement learning Markov decision processes MDPs We extend usual notion action include optionswhole courses behavior may temporally extended stochastic contingent events Examples options include picking object going lunch traveling distant city well primitive actions muscle twitches joint torques Options may given priori learned experience They may used interchangeably actions variety planning learning methods The theory semiMarkov decision processes SMDPs applied model consequences options basis planning learning methods using In paper develop connections building prior work Bradtke Duff Parr prep others Our main novel results concern interface MDP SMDP levels analysis We show set options altered changing termination conditions improve SMDP methods additional cost We also introduce intraoption temporaldifference methods able learn fragments options execution Finally propose notion subgoal used improve options Overall argue options models provide hitherto missing aspects powerful clear expressive framework representing organizing knowledge
articles neural network learning algorithms published examined amount experimental evaluation contain employ even single realistic real learning problem Only articles present results one problem using real world data Furthermore one third articles present quantitative comparison previously known algorithm These results suggest strive better assessment practices neural network learning algorithm research For longterm benefit field publication standards raised respect easily accessible collections benchmark problems built
Technical Report Number CS Computer Science Engineering UCSD Abstract The developmental mechanisms transforming genotypic phenotypic forms typically omitted formulations genetic algorithms GAs two representational spaces identical We argue careful analysis developmental mechanisms useful understanding success several standard GA techniques clarify relationships recently proposed enhancements We provide framework distinguishes two developmental mechanisms learning maturation also showing several common effects GA search This framework used analyze maturation local search change dynamics GA We observe contexts maturation local search incorporated fitness evaluation illustrate reasons considering seperately Further identify contexts maturation local search distinguished fitness evaluation
Finding optimal least good monitoring strategies important consideration designing agent We applied genetic programming task mixed results Since agent control language kept purposefully general set monitoring strategies constitutes small part overall space possible behaviors Because often difficult genetic algorithm evolve even though performance superior These results raise questions easy genetic programming scale areas applied become complex
Marketing decision making tasks require acquisition efficient decision rules noisy questionnaire data Unlike popular learningfromexample methods tasks must interpret characteristics data without clear features data predetermined evaluation criteria The problem domain experts get simple easytounderstand accurate knowledge noisy data This paper describes novel method acquire efficient decision rules questionnaire data using simulated breeding inductive learning techniques The basic ideas method simulated breeding used get effective features questionnaire data inductive learning used acquire simple decision rules data The simulated breeding one Genetic Algorithm based techniques subjectively interactively evaluate qualities offspring generated genetic operations The proposed method qualitatively quantitatively validated case study consumer product questionnaire data acquired rules simpler results direct application inductive learning domain expert admits easy understand level accuracy compared methods
This paper experimentally compares three approaches program induction inductive logic programming ILP genetic programming GP genetic logic programming GLP variant GP inducing Prolog programs Each methods used induce four simple recursive listmanipulation functions The results indicate ILP likely induce correct program small sets random examples GP generally less accurate GLP performs worst rarely able induce correct program Interpretations results terms differences search methods inductive biases presented Keywords Genetic Programming Inductive Logic Programming Empiri cal Comparison This paper also submitted th Int Workshop Inductive Logic Programming
Two major problems casebased reasoning efficient justified retrieval source cases adaptation retrieved solutions conditions target For analogical theorem proving induction describe solutionrelevant abstraction restrict retrieval source cases mapping source problem target problem determine reformulations adapt source solution
Most commonly casebased reasoning applied domains attribute value representations cases sufficient represent features relevant support classification diagnosis design tasks Distance functions like Hammingdistance transformation similarity functions applied retrieve past cases used generate solution actual problem Often domain knowledge available adapt past solutions new problems evaluate solutions However domains like architectural design law structural case representations corresponding structural similarity functions needed Often acquisition adaptation knowledge seems impossible rather requires effort manageable fielded applications Despite humans use cases main source generate adapted solutions How achieve computationally This paper presents general approach structural similarity assessment adaptation The approach allows explore structural case representations limited domain knowledge support design tasks It exemplarily instantiated three modules design assistant FABELIdea generates adapted design solutions basis prior CAD layouts
The main difficulty implementing natural gradient learning rule compute inverse Fisher information matrix input dimension large We found new scheme represent Fisher information matrix Based scheme designed algorithm compute inverse Fisher information matrix When input dimension n much larger number hidden neurons complexity algorithm order On complexity conventional algorithms purpose order On The simulation confirmed efficience robustness natural gradient learning rule
The ability casebased reasoning CBR systems apply cases novel situations depends case adaptation knowledge However endowing CBR systems adequate adaptation knowledge proven difficult task This paper describes hybrid method performing case adaptation using combination rulebased casebased reasoning It shows approach provides framework acquiring flexible adaptation knowledge experiences autonomous adaptation suggests potential basis acquisition adaptation knowledge interactive user guidance It also presents initial experimental results examining benefits approach comparing relative contributions case learning adaptation learning reasoning performance
This paper discusses traditional reinforcement learning methods algorithms applied models result poor performance dynamic situated multiagent domains characterized multiple goals noisy perception action inconsistent reinforcement We propose methodology designing representation forcement functions take advantage implicit domain knowledge order accelerate learning domains demonstrate experimentally two different mobile robot domains
Learning problem solving intimately related problem solving determines knowledge requirements reasoner learning must fulfill learning enables improved problemsolving performance Different models problem solving however recognize different knowledge needs result set different learning tasks Some recent models analyze problem solving terms generic tasks methods subtasks These models require learning problemsolving concepts new tasks new task decompositions We view reflection core process learning problemsolving concepts In paper identify learning issues raised taskstructure framework problem solving We view problem solver abstract device represent works terms structurebehaviorfunction model specifies knowledge reasoning problem solver results accomplishment tasks We describe model enables reflection modelbased reflection enables reasoner adapt task structure produce solutions better quality The Autognostic system illustrates reflection process
Realistic complex planning situations require mixedinitiative planning framework human automated planners interact mutually construct desired plan Ideally joint cooperation potential achieving better plans either human machine create alone Human planners often take casebased approach planning relying past experience planning retrieving adapting past planning cases Planning analogical reasoning generative casebased planning combined ProdigyAnalogy provides suitable framework study mixedinitiative integration However human user engaged planning loop creates variety new research questions The challenges found creating mixedinitiative planning system fall three categories planning paradigms differ human machine planning visualization plan planning process complex necessary task human users range across spectrum experience respect planning domain underlying planning technology This paper presents approach three problems designing interface incorporate human process planning analogical reasoning ProdigyAnalogy The interface allows user follow generative casebased planning supports visualization plan planning rationale addresses variance experience user allowing user control presentation information
Evolutionary Programming Evolution Strategies rather similar representatives class probabilistic optimization algorithms gleaned model organic evolution discussed compared respect similarities differences basic components well performance experimental runs Theoretical results global convergence step size control strictly convex quadratic function extension convergence rate theory Evolution Strategies presented discussed respect implications Evolutionary Programming
A control law constructed linear time varying system solving two player zero sum differential game moving horizon game used construct H controller finite horizon Conditions given controller results stable system satisfies infinite horizon H norm bound A risk sensitive formulation used provide state estimator observation feedback case
In paper investigate genetic algorithms two parents involved recombination operation In particular introduce gene scanning reproduction mechanism generalizes classical crossovers npoint crossover uniform crossover applicable arbitrary number two parents We performed extensive tests optimizing numerical functions TSP graph coloring observe effect different numbers parents The experiments show parent recombination outperformed using parents classical DeJong functions For problems results conclusive cases parents optimal others parents better
A novel method proposed combining multiple probabilistic classifiers different feature sets In order achieve improved classification performance generalized finite mixture model proposed linear combination scheme implemented based radial basis function networks In linear combination scheme soft competition different feature sets adopted automatic feature rank mechanism different feature sets always simultaneously used optimal way determine linear combination weights For training linear combination scheme learning algorithm developed based ExpectationMaximization EM algorithm The proposed method applied typical real world problem viz speaker identification different feature sets often need consideration simultaneously robustness Simulation results show proposed method yields good performance speaker identification
Different learning models employ different styles generalization novel inputs This paper proposes need multiple styles generalization support broad application base The Priority ASOCS model Priority Adaptive SelfOrganizing Concurrent System overviewed presented potential platform support multiple generalization styles PASOCS adaptive network composed many simple computing elements operating asynchronously parallel The PASOCS operate either data processing mode learning mode During data processing mode system acts parallel hardware circuit During learning mode PASOCS incorporates rules attached priorities represent application learned Learning accomplished distributed fashion time logarithmic number rules The new model significant learning time space complexity improvements previous models Generalization learning system best always guess The proper style generalization application dependent Thus one style generalization may sufficient allow learning system support broad spectrum applications Current connectionist models use one specific style generalization implicit learning algorithm We suggest type generalization used selforganizing parameter learning system discovered learning takes place This requires model allows flexible generalization styles b mechanisms guide system best style generalization problem learned This paper overviews learning model seeks efficiently support requirement The model called Priority ASOCS PASOCS member class models called ASOCS Adaptive SelfOrganizing Concurrent Systems Section paper gives example different generalization techniques approach problem Section presents overview PASOCS Section illustrates flexible generalization supported Section concludes paper
We introduce new approach model selection performs better standard complexitypenalization holdout error estimation techniques many cases The basic idea exploit intrinsic metric structure hypothesis space determined natural distribution unlabeled training patterns use metric reference detect whether empirical error estimates derived small labeled training sample trusted region around empirically optimal hypothesis Using simple metric intuitions develop new geometric strategies detecting overfitting performing robust yet responsive model selection spaces candidate functions These new metricbased strategies dramatically outperform previous approaches experimental studies classical polynomial curve fitting Moreover technique simple efficient applied function learning tasks The requirement access auxiliary collection unlabeled training data
In paper use genetic algorithm evolve set classification rules realvalued attributes We show realvalued attribute ranges encoded realvalued genes present new uniform method representing dont cares rules We view supervised classification optimization problem evolve rule sets maximize number correct classifications input instances We use variant Pitt approach geneticbased machine learning system novel conflict resolution mechanism competing rules within rule set Experimental results demonstrate effectiveness proposed approach benchmark wine classifier system
Genetic algorithms proven powerful tool within area machine learning However classes problems seem scarcely applicable eg solution given problem consists several parts influence In case classic genetic operators crossover mutation work well thus preventing good performance This paper describes approach overcome problem using highlevel genetic operators integrating task specific domain independent knowledge guide use operators The advantages approach shown learning rule base adapt parameters image processing operator path within SOLUTION system
In sparse data environments greater classification accuracy achieved learning several concept descriptions data combining classifications Stochastic search general tool used generate many good concept descriptions rule sets class data Bayesian probability theory offers optimal strategy combining classifications individual concept descriptions use approximation theory This strategy useful additional data difficult obtain every increase classification accuracy important The primary result paper multiple concept descriptions particularly helpful flat hypothesis spaces many equally good ways grow rule similar gain Another result experimental evidence learning multiple rule sets yields accurate classifications learning multiple rules domains To demonstrate behaviors learn multiple concept descriptions adapting HYDRA noisetolerant relational learning algorithm
In paper present novel multiagent learning paradigm called teampartitioned opaquetransition reinforcement learning TPOTRL TPOTRL introduces concept using actiondependent features generalize state space In work use learned actiondependent feature space TPOTRL effective technique allow team agents learn cooperate towards achievement specific goal It adaptation traditional RL methods applicable complex nonMarkovian multiagent domains large state spaces limited training opportunities Multiagent scenarios opaquetransition team members always full communication one another adversaries may affect environment Hence learner rely knowledge future state transitions acting world TPOTRL enables teams agents learn effective policies training examples even face large state space large amounts hidden state The main responsible features dividing learning task among team members using coarse actiondependent feature space allowing agents gather reinforcement directly observation environment TPOTRL fully implemented tested robotic soccer domain complex multiagent framework This paper presents algorithmic details TPOTRL well empirical results demonstrating effectiveness developed multiagent learning approach learned features
This paper discusses method training multilayer perceptron networks called DMP Dynamic Multilayer Perceptron The method based upon divide conquer approach builds networks form binary trees dynamically allocating nodes layers needed The focus paper effects using multiple node types within DMP framework Simulation results show DMP performs favorably comparison learning algorithms using multiple node types beneficial network performance
Specification refinement part formal program derivation method software directly constructed provably correct specification Because program derivation intensive manual exercise used critical software systems automated approach would allow viable many types software systems The goal research determine genetic programming GP used automate specification refinement process The initial steps toward goal show wellknown proof logic program derivation encoded GPbased system infer sentences logic proof particular sentence The results promising indicate GP useful aiding pro gram derivation
This paper appears chapter Kenneth E Kinnear Jr Peter J Angeline editors Advances Genetic Programming MIT Press Abstract Genetic Programming GP automatic method generating computer programs stored data structures manipulated evolve better programs An extension restricting search space Strongly Typed Genetic Programming STGP basic premise removal closure typing arguments return values functions also typing terminal set A restriction STGP two levels typing We extend STGP allowing type hierarchy allows two levels typing
We integrated distributed search genetic programming based systems collective memory form collective adaptation search method Such system significantly improves search problem complexity increased However still considerable scope improvement In collective adaptation search agents gather knowledge environment deposit central information repository Process agents able manipulate focused knowledge exploiting exploration search agents We examine utility increasing capabilities centralized pro cess agents
Our casebased reasoning CBR integration constraint satisfaction problem CSP formalism undergone several transformations journey initial research idea productintent design Both unexpected research results well interesting insights realworld applicability integrated methodology emerged integration explored alternative viewpoints In paper alternative viewpoints results enabled viewpoints described
In order rank performance machine learning algorithms many researchers conduct experiments benchmark data sets Since learning algorithms domainspecific parameters popular custom adapt parameters obtain minimal error rate test set The rate used rank algorithm causes optimistic bias We quantify bias showing particular algorithm parameters probably ranked higher equally good algorithm fewer parameters We demonstrate result showing number parameters trials required order pretend outperform C FOIL respectively various benchmark problems We describe unbiased ranking experiments conducted
We report series experiments decision trees consistent training data constructed These experiments run gain understanding properties set consistent decision trees factors affect accuracy individual trees In particular investigated relationship size decision tree consistent training data accuracy tree test data The experiments performed massively parallel Maspar computer The results experiments several artificial two real world problems indicate many problems investigated smaller consistent decision trees average less accurate average accuracy slightly larger trees
An ensemble consists set independently trained classifiers neural networks decision trees whose predictions combined classifying novel instances Previous research shown ensemble whole often accurate single classifiers ensemble Bagging Breiman Boosting Freund Schapire two relatively new popular methods producing ensembles In paper evaluate methods using neural networks decision trees classification algorithms Our results clearly show two important facts The first even though Bagging almost always produces better classifier individual component classifiers relatively impervious overfitting generalize better baseline neuralnetwork ensemble method The second Boosting powerful technique usually produce better ensembles Bagging however susceptible noise quickly overfit data set
Pruning decision tree considered researchers important part tree building noisy domains While many approaches pruning alternative approach averaging decision trees received much attention We perform empirical comparison pruning approach averaging decision trees For comparison use computationally efficient method averaging namely averaging extended fanned set tree Since wide range approaches pruning compare tree averaging traditional pruning approach along optimal pruning approach
A widely held idea regarding information processing brain cellassembly hypothesis suggested Hebb According hypothesis basic unit information processing brain assembly cells act briefly closed system response specific stimulus This work presents novel method characterizing supposed activity using Hidden Markov Model This model able reveal underlying cortical network activity behavioral processes In study process hand simultaneous activity several cells recorded frontal cortex behaving monkeys Using model able identify behavioral mode animal directly identify corresponding collective network activity Furthermore segmentation data discrete states also provides direct evidence state dependency shorttime correlation functions pair cells Thus crosscorrelation depends network state activity local connectivity alone
We consider problem model selection accounting model uncertainty highdimensional contingency tables motivated expert system applications The approach used currently stepwise strategy guided tests based approximate asymptotic P values leading selection single model inference conditional selected model The sampling properties strategy complex failure take account model uncertainty leads underestimation uncertainty quantities interest In principle panacea provided standard Bayesian formalism averages posterior distributions quantity interest models weighted posterior model probabilities Furthermore approach optimal sense maximising predictive ability However used practice computing posterior model probabilities hard number models large often greater We argue standard Bayesian formalism unsatisfactory propose alternative Bayesian approach contend takes full account true model uncertainty averaging much smaller set models An efficient search algorithm developed finding models We consider two classes graphical models arise expert systems recursive causal models decomposable fl David Madigan Assistant Professor Statistics Adrian E Raftery Professor Statistics Sociology Department Statistics GN University Washington Seattle WA Madigans research partially supported Graduate School Research Fund University Washington NSF Rafterys research supported ONR Contract NJ The authors grateful Gregory Cooper Leo Goodman Shelby Haberman David Hinkley Graham Upton Jon Wellner Nanny Wermuth Jeremy York Walter Zucchini two anonymous referees helpful comments discussions Michael R Butler providing data scrotal swellings example
z Yorks research supported NSF graduate fellowship The authors grateful Julian Besag David Bradshaw Jeff Bradshaw James Carlsen David Draper Ivar Heuch Robert Kass Augustine Kong Steffen Lauritzen Adrian Raftery James Zidek helpful comments discussions
We present automated emotion recognition system capable identifying six basic emotions happy surprise sad angry fear disgust novel face images An ensemble simple feedforward neural networks used rate images The outputs networks combined generate score emotion The networks trained database face images human subjects consistently rated portraying single emotion Such system achieves generalization novel face images individuals networks trained drawn database The neural network model exhibits categorical perception emotion pairs A linear sequence morph images created two expressions individuals face sequence analyzed model Sharp transitions output response vector occur single step sequence emotion pairs others We plan us models response limit direct testing determining human subjects exhibit categorical perception morph image sequences
We discuss advantages using overdetermined mixtures improve upon blind source separation algorithms designed extract sound sources acoustic mixtures A study nature room impulse responses helps us choose adaptive filter architecture We use ideal inverses acquired room impulse responses compare effectiveness differentsized separating filter configurations various filter lengths Using multichannel blind leastmeansquare algorithm MBLMS show adding additional sensors improve upon separation signals mixed real world filters
Rissanens Minimum Description Length MDL principle adapted handle continuous attributes Inductive Logic Programming setting Application developed coding MDL pruning mechanism devised The behavior MDL pruning tested synthetic domain artificially added noise different levels two real life problems modelling surface roughness grinding workpiece modelling mutagenicity nitroaromatic compounds Results indicate MDL pruning successful parameterfree noise fighting tool reallife domains since acts safeguard building complex models retaining accuracy model
We address difficult problem separating multiple speakers multiple microphones real room We combine work Torkkola Amari Cichocki Yang give Natural Gradient information maximisation rules recurrent IIR networks blindly adjusting delays separating deconvolving mixed signals While work well simulated data rules fail real rooms usually involve nonminimum phase transfer functions notinvertible using stable IIR filters An approach sidesteps problem perform infomax feedforward architecture frequency domain Lambert We demonstrate realroom separation two natural signals using approach
For blind source separation Fisher information matrix used Riemannian metric tensor parameter space steepest descent algorithm maximize likelihood function Riemannian parameter space becomes serial updating rule equivariant property This algorithm simplified using asymptotic form Fisher information matrix around equilibrium
We discovered new scheme represent Fisher information matrix stochastic multilayer perceptron Based scheme designed algorithm compute inverse Fisher information matrix When input dimension n much larger number hidden neurons complexity algorithm order On complexity conventional algorithms purpose order On The inverse Fisher information matrix used natural gradient descent algorithm train singlelayer multilayer perceptrons It confirmed simulation natural gradient
In paper define task place learning describe one approach problem Our framework represents distinct places evidence grids probabilistic description occupancy Place recognition relies nearest neighbor classification augmented registration process correct translational differences two grids The learning mechanism lazy involves simple storage inferred evidence grids Experimental studies physical simulated robots suggest approach improves place recognition experience handle significant sensor noise benefits improved quality stored cases scales well environments many distinct places Additional studies suggest using historical information robots path environment actually reduce recognition accuracy Previous researchers studied evidence grids place learning combined two powerful concepts used systematic experimentation evaluate methods abilities
Let us call nondeterministic incremental algorithm one able construct solution combinatorial problem selecting incrementally ordered sequence choices defines solution choice made nondeterministically In case state space represented tree solution path root tree leaf This paper describes simulated evolution population nondeterministic incremental algorithms offers new approach exploration state space compared techniques like Genetic Algorithms GA Evolutionary Strategies ES Hill Climbing In particular efficiency method implemented Evolving NonDeterminism END model presented sorting network problem reference problem challenged computer science Then shall show END model remedies drawbacks optimization techniques even outperforms problem Indeed input sorting networks good best known built scratch even yearold result input problem improved one comparator
VISOR neural network system object recognition scene analysis learns visual schemas examples Processing VISOR based cooperation competition parallel bottomup topdown activation schema representations Similar principles appear underlie much human visual processing VISOR therefore used model various perceptual phenomena This paper focuses analyzing three phenomena simulation VISOR priming mental imagery perceptual reversal circular reaction The results illustrate similarity subtle differences mechanisms mediating priming mental imagery show two opposing accounts perceptual reversal neural satiation cognitive factors may contribute phenomenon demonstrate intentional actions gradually learned reflex actions Successful simulation effects suggests similar mechanisms may govern human visual perception learning visual schemas
A novel approach object recognition scene analysis based neural network representation visual schemas described Given input scene VISOR system focuses attention successively component schema representations cooperate compete match inputs The schema hierarchy learned examples unsupervised adaptation reinforcement learning VISOR learns objects important others identifying scene importance spatial relations varies depending scene As inputs differ increasingly schemas VISORs recognition process remarkably robust automatically generates measure confidence analysis
Truly autonomous vehicles require projec tive planning reactive components order perform robustly Projective components needed longterm planning replanning explicit reasoning future states required Reactive components allow system always action available realtime exhibit robust behavior lack ability expli citly reason future states long time period This work addresses problem creating reactive components autonomous vehicles Creating reactive behaviors stimulusresponse rules generally difficult requiring acquisition much knowledge domain experts problem referred knowledge acquisition bottleneck SAMUEL system learns reactive behaviors autonomous agents SAMUEL learns behaviors simulation automating process creating stimulusresponse rules therefore reducing bottleneck The learning algorithm designed learn useful behaviors simulations limited fidelity Current work investigating well behaviors learned simulation environments work real world environments In paper describe SAMUEL describe behaviors learned simulated autonomous aircraft autonomous underwater vehicles robots These behaviors include dog fighting missile evasion track ing navigation obstacle avoidance
Feedforward nets sigmoidal activation functions often designed minimizing cost criterion It pointed technique may outperformed classical perceptron learning rule least problems In paper show pathologies arise error criterion threshold LMS type ie zero values beyond desired target values More precisely show data linearly separable one considers nets hidden neurons error function local minima global Simulations networks hidden units consistent results often data classified minimizing threshold LMS criterion may fail classified using instead simple LMS cost In addition proof gives following stronger result stated hypotheses continuous gradient adjustment procedure initial weight configuration separating set weights obtained finite time This precise analogue Perceptron Learning Theorem The results compared classical pattern recognition problem threshold LMS linear activations spurious local minima exist even nonseparable data shown even using threshold criterion bad local minima may occur data separable sigmoids used
This paper combines existing models longitudinal spatial data hierarchical Bayesian framework particular emphasis role time spacevarying covariate effects Data analysis implemented via Markov chain Monte Carlo methods The methodology illustrated tentative reanalysis Ohio lung cancer data Two approaches adjust unmeasured spatial covariates particularly tobacco consumption described The first includes random effects model account unobserved heterogeneity second adds simple urbanization measure surrogate smoking behaviour The Ohio dataset particular interest suggestion nuclear facility southwest state may caused increased levels lung cancer However contend data inadequate proper investigation issue fl Email leostatunimuenchende
Although many algorithms learning examples developed many comparisons reported generally accepted benchmark classifier learning The existence standard benchmark would greatly assist comparisons Sixteen dimensions proposed describe classification tasks Based thirteen realworld synthetic datasets chosen set covering method UCI Repository machine learning databases form benchmark
Hollands Schema Theorem widely taken foundation explanations power genetic algorithms GAs Yet dissent expressed implications Here dissenting arguments reviewed elaborated upon explaining Schema Theorem implications well GA performing Interpretations Schema Theorem implicitly assumed correlation exists parent offspring fitnesses assumption made explicit results based Prices Covariance Selection Theorem Schemata play part performance theorems derived representations operators general However schemata reemerge recombination operators used Using Geiringers recombination distribution representation recombination operators missing schema theorem derived makes explicit intuition GA perform well Finally method adaptive landscape analysis examined counterexamples offered commonly used correlation statistic Instead alternative statistic transmission function fitness domain proposed optimal statistic estimating GA performance limited samples
This report supported part Navy Medical Research Development Command Office Naval Research Department Navy work unit ONRReimb The views expressed article authors reflect official policy position Department Navy Department Defense US Government Approved public release distribution unlimited
An approach analytic learning described searches accurate entailments Horn Clause domain theory A hillclimbing search guided information based evaluation function performed applying set operators derive frontiers domain theories The analytic learning system one component multistrategy relational learning system We compare accuracy concepts learned analytic strategy concepts learned analytic strategy operationalizes domain theory
Any system learns filter documents suffer poor performance initial training phase One way addressing problem exploit filters learned users collaborative fashion We investigate direct transfer learned filters settinga limiting case collaborative learning system We evaluate stability several different learning methods direct transfer conclude symbolic learning methods use negatively correlated features data perform poorly transfer even perform well conventional evaluation settings This effect robust holds several learning methods diverse set users used training classifier even learned classifiers adapted new users distribution Our experiments give rise several concrete proposals improving generalization performance collaborative setting including beneficial variation feature selection method widely used text categorization
We present coevolutionary architecture solving decomposable problems apply evolution artificial neural networks Although work preliminary nature number advantages noncoevolutionary approaches The coevolutionary approach utilizes divideandconquer technique species representing simpler subtasks evolved separate instances genetic algorithm executing parallel Collaborations among species formed representing complete solutions Species created dynamically needed Results presented coevolutionary architecture produces higher quality solutions fewer evolutionary trials compared alternative non coevolutionary approach problem evolving cascade networks parity computation
We introduce algorithm lllama combines simple pattern recognizers general method estimating entropy sequence Each pattern recognizer exploits partial match subsequences build model sequence Since primary features interest biological sequence domains subsequences small variations exact composition lllama particularly suited domains We describe two methods lllamalength lllamaalone use entropy estimate perform maximum posteriori classification We apply methods several problems threedimensional structure classification short DNA sequences The results include fl Email loewenstpaulrutgersedu Phone Fax Email bermanadeninerutgersedu z Email hirshcsrutgersedu
RISE Domingos press rule induction algorithm proceeds gradually generalizing rules starting one rule per example This several advantages compared common strategy gradually specializing initially null rules shown lead significant accuracy gains algorithms like CRULES CN large number application domains However RISEs running time like rule induction algorithms quadratic number examples making unsuitable processing large databases This paper studies use partitioning speed RISE compares wellknown method windowing The use partitioning specifictogeneral induction setting creates synergies would possible generaltospecific system Partitioning often reduces running time improves accuracy time In noisy conditions performance windowing deteriorates rapidly partitioning remains stable
To investigate issue modularity emerges nature present Artificial Life model allow us reproduce computer organisms ie robots genotype nervous system sensory motor organs environment organisms live behave reproduce In simulations neural networks evolutionarily trained control mobile robot designed keep arena clear picking trash objects releasing outside arena During evolutionary process modular neural networks control robots behavior emerge result genetic duplications Preliminary simulation results show duplicationbased modular architecture outperforms nonmodular architecture represents starting architecture simulations Moreover interaction mutation duplication rate emerges results Our future goal use model order explore relationship evolutionary emergence modularity phenomenon gene duplication
We describe new theory differential learning broad family pattern classifiers including many wellknown neural network paradigms learn stochastic concepts efficiently We describe relationship classifiers ability generalize well unseen test examples efficiency strategy learns We list series proofs differential learning efficient information computational resource requirements whereas traditional probabilistic learning strategies The proofs illustrated simple example lends closedform analysis We conclude optical character recognition task three different types differentially generated classifiers generalize significantly better probabilistically generated counterparts
With machine learning methods given knowledge representation space inadequate learning process fail This also true methods using neural networks form representation space To overcome limitation automatic construction method neural network proposed This paper describes BPHCI method hypothesisdriven constructive induction neural network trained backpropagation algorithm The method searches better representation space analyzing hypotheses generated step iterative learning process The method applied ten problems include particular exclusiveor MONK parityBIT inverse parityBIT problems All problems successfully solved initial set parameters extension representation space necessary extension problem
This paper investigates alternative estimators accuracy concepts learned examples In particular crossvalidation bootstrap estimators studied using synthetic training data foil learning algorithm Our experimental results contradict previous papers statistics advocate bootstrap method superior crossvalidation Nevertheless results also suggest conclusions based crossvalidation previous machine learning papers unreliable Specifically observations true error concept learned foil independently drawn sets examples concept varies widely ii estimate true error provided crossvalidation high variability approximately unbiased iii bootstrap estimator lower variability crossvalidation systematically biased
The problem driving autonomous vehicle normal traffic engages many areas AI research substantial economic significance We describe work progress new approach problem uses decisiontheoretic architecture using dynamic probabilistic networks The architecture provides sound solution problems sensor noise sensor failure uncertainty behavior vehicles effects ones actions We report advances theory inference decision making dynamic partially observable domains Our approach implemented simulation system autonomous vehicle successfully negotiates variety difficult situations
Two recently implemented machine learning algorithms RIPPER sleeping experts phrases evaluated number large text categorization problems These algorithms construct classifiers allow context word w affect even whether presence absence w contribute classification However RIPPER sleeping experts differ radically many respects differences include different notions constitutes context different ways combining contexts construct classifier different methods search combination contexts different criteria contexts included combination In spite differences RIPPER sleeping experts perform extremely well across wide variety categorization problems generally outperforming previously applied learning methods We view result confirmation usefulness classifiers represent contextual information
We address problem finding parameter settings result optimal performance given learning algorithm using particular dataset training data We describe wrapper method considering determination best parameters discrete function optimization problem The method uses bestfirst search crossvalidation wrap around basic induction algorithm search explores space parameter values running basic algorithm many times training holdout sets produced crossvalidation get estimate expected error parameter setting Thus final selected parameter settings tuned specific induction algorithm dataset studied We report experiments method datasets selected UCI StatLog collections using C basic induction algorithm At confidence level method improves performance C nine domains degrades performance one statistically indistinguishable C rest On sample datasets used comparison method yields average relative decrease error rate We expect see similar performance improvements using method machine learning al gorithms
The Feature Vector Editor offers userextensible environment exploratory data analysis Several empirical studies applied environment SHERFACS International Conflict Management dataset Current analysis techniques include boolean analysis temporal analysis automatic rule learning Implemented portably ANSI Common Lisp Common Lisp Interface Manager CLIM system features advanced interface makes intuitive people manipulate data discover significant relationships The system encapsulates data within objects defines generic protocols mediate interactions data users analysis algorithms Generic data protocols make possible rapid integration new datasets new analytical algorithms heterogeneous data formats More sophisticated research reformulates SHERFACS conflict codings machineparsable narratives suitable processing semantic representations RELATUS Natural Language System Experiments SHERFACS cases demonstrated feasibility building knowledge bases synthetic texts exceeding pages
We introduce two boosting algorithms aim increase generalization accuracy given classifier incorporating level component stacked generalizer Both algorithms construct complementary level classifier generate coarse hypotheses training data We show two algorithms boost generalization accuracy representative collection data sets The two algorithms distinguished one modifies class targets selected training instances order train complementary classifier We show two algorithms achieve approximately equal generalization accuracy create complementary classifiers display different degrees accuracy diversity Our study provides evidence may useful investigate families boosting algorithms incorporate varying levels accuracy diversity achieve appropriate mix given task domain
Object localization applications many areas engineering science The goal spatially locate arbitrarilyshaped object In many applications desirable minimize number measurements collected purpose ensuring sufficient localization accuracy In surgery example collecting large number localization measurements may either extend time required perform surgical procedure increase radiation dosage patient exposed Localization accuracy function spatial distribution discrete measurements object measurement noise present In Simon et al metrics presented evaluate information available set discrete object measurements In study new approaches discrete point data selection problem described These include hillclimbing genetic algorithms GAs PopulationBased Incremental Learning PBIL Extensions standard GA PBIL methods employ multiple parallel populations explored The results extensive empirical testing provided The results suggest combination PBIL hillclimbing result best overall performance A computerassisted surgical system incorporates methods presented paper currently evaluated cadaver trials EvolutionBased Methods Selecting Point Data Shumeet Baluja supported National Science Foundation Graduate Student Fellowship Graduate Student Fellowship National Aeronautics Space Administration administered Lyndon B Johnson Space Center Houston TX David Simon partially supported National Science Foundation National Challenge grant award IRI Object Localization Applications
The research reported paper describes Fossil ILP system uses search heuristic based statistical correlation Several interesting properties heuristic discussed shown naturally extended simple powerful stopping criterion independent number training examples Instead Fossils stopping criterion depends search heuristic estimates utility literals uniform scale After comparison Foil mFoil KRK domain mesh data outline ideas Fossil adopted topdown pruning present preliminary results
Psychological evidence shows probability theory proper descriptive model intuitive human judgment Instead heuristics proposed descriptive model This paper argues probability theory limi tations even normative model A new normative model judgment uncertainty designed assumption systems knowledge resources insufficient respect questions system needs answer The proposed heuristics human reasoning also observed new model justified according assumption
This paper describes approach using GP image analysis based idea image enhancement feature detection image segmentation reframed image filtering problems GP used discover efficient optimal filters solve problems However order make search feasible effective terminal sets function sets fitness functions meet requirements In paper requirements described terminals functions fitness functions satisfy proposed Some preliminary experiments also reported GP mentioned characteristics applied segmentation brain magnetic resonance images extremely difficult problem simple solution known compared artificial neural nets
Reading area human cognition studied decades psychologists education researchers artificial intelligence researchers Yet still exist theory accurately describes complete process We believe past attempts fell short due incomplete understanding overall task reading namely complete set mental tasks reasoner must perform read mechanisms carry tasks We present functional theory reading process argue represents coverage task The theory combines experimental results psychology artificial intelligence education linguistics along insights gained research This greater understanding mental tasks necessary reading enable new natural language understanding systems flexible capable earlier ones Furthermore argue creativity necessary component reading process must considered theory system attempting describe We present functional theory creative reading novel knowledge organization scheme supports creativity mechanisms The reading theory currently implemented ISAAC Integrated Story Analysis And Creativity system computer system reads science fiction stories fl This paper part Georgia Institute Technology College Computing Technical Report series
We examine performance memoryless vector quantizer changes function training set size Specifically study well training set distortion predicts test distortion training set randomly drawn subset blocks test training images Using VapnikChervonenkis dimension derive formal bounds difference test training distortion vector quantizer codebooks We describe extensive empirical simulations test bounds variety bit rates vector dimensions give practical suggestions determining training set size necessary achieve good generalization codebook We conclude using training sets comprised small fraction available data one produce results close results obtainable available data used
This paper deals global finitegain inputoutput stabilization linear systems saturated controls For neutrally stable systems shown linear feedback law suggested passivity approach indeed provides stability respect every L p norm Explicit bounds closedloop gains obtained related norms respective systems without saturation These results extend class systems state matrix eigenvalues imaginary axis nonsimple size gt Jordan blocks contradicting may expected fact systems globally asymptotically stabilizable statespace sense shown particular double integrator
This paper deals problem global stabilization linear discrete time systems means bounded feedback laws The main result proved analog one proved continuous time case authors shows stabilization possible system stabilizable arbitrary controls transition matrix spectral radius less equal one The proof provides principle algorithm construction feedback laws implemented either cascades parallel connections single hidden layer neural networks simple saturation functions
The NPcomplete problem determining whether two disjoint point sets ndimensional real space R n separated two planes cast bilinear program minimizing scalar product two linear functions polyhedral set The bilinear program vertex solution processed iterative linear programming algorithm terminates finite number steps point satisfying necessary optimality condition global minimum Encouraging computational experience number test problems reported
The problem discriminating two finite point sets ndimensional feature space separating plane utilizes features possible formulated mathematical program parametric objective function linear constraints The step function appears objective function approximated sigmoid concave exponential nonnegative real line treated exactly considering equivalent linear program equilibrium constraints LPEC Computational tests three approaches publicly available realworld databases carried compared adaptation optimal brain damage OBD method reducing neural network complexity One feature selection algorithm via concave minimization FSV reduced crossvalidation error cancer prognosis database reducing problem features Feature selection important problem machine learning In basic form problem consists eliminating many features given problem possible still carrying preassigned task acceptable accuracy Having minimal number features often leads better generalization simpler models easily interpreted In present work task discriminate two given sets ndimensional feature space using given features possible We shall formulate problem mathematical program parametric objective function attempt achieve task generating separating plane feature space small dimension possible minimizing average distance misclassified points plane One computational experiments carried feature selection procedure showed effectiveness minimizing number features selected also quickly recognizing removing spurious random features introduced Thus Wisconsin Prognosis Breast Cancer WPBC database feature space dimensions random features added one algorithms FSV immediately removed random features well original features resulting separating plane dimensional reduced feature space By using tenfold crossvalidation separation error dimensional space reduced corresponding error original problem space See Section details We note mathematical programming approaches feature selection problem recently proposed Even though approach based LPEC formulation LPEC method solution different ones used The polyhedral concave minimization approach principally involved theoretical considerations one specific algorithm crossvalidatory results given Other effective computational applications mathematical programming neural networks given
This work describes approach inferring Deterministic Contextfree DCF Grammars Connectionist paradigm using Recurrent Neural Network Pushdown Automaton NNPDA The NNPDA consists recurrent neural network connected external stack memory common error function We show NNPDA able learn dynamics underlying pushdown automaton examples grammatical nongrammatical strings Not network learn state transitions automaton also learns actions required control stack In order use continuous optimization methods develop analog stack reverts discrete stack quantization activations network learned transition rules stack actions We show enhancement networks learning capabilities providing hints In addition initial comparative study simulations first second third order recurrent networks shown increased degree freedom higher order networks improve generalization necessarily learning speed
This paper presents HGA genetic algorithm written VHDL intended hardware implementation Due pipelining parallelization function call overhead hardware GA yields significant speedup software GA especially useful GA used realtime applications eg disk scheduling image registration Since generalpurpose GA requires fitness function easily changed hardware implementation must exploit reprogrammability certain types fieldprogrammable gate arrays FPGAs programmed via bit pattern stored static RAM thus easily reconfigured After presenting background VHDL paper takes reader HGAs code We describe applications HGA feasible given stateoftheart FPGA technology summarize possible extensions design Finally review work hardwarebased GAs
One basic probabilistic tools used time series modeling hidden Markov model HMM In HMM information past time series conveyed single discrete variablethe hidden state We present generalization HMMs state factored multiple state variables therefore represented distributed manner Both inference learning model depend critically computing posterior probabilities hidden state variables given observations We present exact algorithm inference model relate ForwardBackward algorithm HMMs algorithms general belief networks Due combinatorial nature hidden state representation exact algorithm intractable As intractable systems approximate inference carried using Gibbs sampling mean field theory We also present structured approximation state variables decoupled based derive tractable learning algorithm Empirical comparisons suggest approximations efficient accurate alternatives exact methods Finally use structured approximation model Bachs chorales show outperforms HMMs capturing complex temporal patterns dataset
We develop refined mean field approximation inference learning probabilistic neural networks Our mean field theory unlike assume units behave independent degrees freedom instead exploits principled way existence large substructures computationally tractable To illustrate advantages framework show incorporate weak higher order interactions firstorder hidden Markov model treating corrections first order structure within mean field theory
This chapter takes different standpoint address problem learning We reason terms probability make extensive use chain rule known Bayes rule A fast definition basics probability provided appendix A quick reference Most chapter review methods Bayesian learning applied modelling purposes Some original analyses comments also provided section There latent rivalry Bayesian Orthodox statistics It means intention enter kind controversy We perfectly willing accept orthodox well unorthodox methods long scientifically sound provide good results applied learning tasks The disclaimer applies two frameworks presented They object heated controversy past years neural networks community We take side present frameworks strong points weaknesses In context work Bayesian frameworks especially interesting provide continuous update rules used regularised cost minimisation yield automatic selection regularisation level Unlike methods presented chapter necessary try several regularisation levels perform many optimisations The Bayesian framework one training achieved onepass optimisation procedure
Abstract In document I first review theory behind bitsback coding aka free energy coding Frey Hinton describe interface Clanguage software used bitsback coding This method new approach problem optimal compression source code produces multiple codewords given symbol It may seem sensible codeword use case shortest one However proposed bitsback approach random codeword selection yields effective codeword length less shortest codeword length If random choices Boltzmann distributed effective length optimal given source code The software I describe guide easy use source code pages long I illustrate bitsback coding software simple quantized Gaussian mixture problem
A modified Recurrent Neural Network RNN used learn SelfRouting Interconnection Network SRIN set routing examples The RNN modified several distinct initial states This equivalent single RNN learning multiple different synchronous sequential machines We define sequential machine structure augmented show SRIN essentially Augmented Synchronous Sequential Machine ASSM As example learn small sixswitch SRIN After training extract networks internal representation ASSM corresponding SRIN fl This paper adapted Goudreau Chapter A shortened version paper published Goudreau Giles
This paper outlines methodology analyzing representational support knowledgebased decisionmodeling broad domain A relevant set inference patterns knowledge types identified By comparing analysis results existing representations insights gained design approach integrating categorical uncertain knowledge context sensitive manner
We propose computational framework understanding modeling human consciousness This framework integrates many existing theoretical perspectives yet sufficiently concrete allow simulation experiments We attempt explain qualia subjective experience instead ask differences exist within cognitive information processing system person conscious mentallyrepresented information versus information unconscious The central idea explore contents consciousness correspond temporally persistent states network computational modules Three simulations described illustrating behavior persistent states models corresponds roughly behavior conscious states people experience performing similar tasks Our simulations show periodic settling persistent ie conscious states improves performance cleaning inaccuracies noise forcing decisions helping keep system track toward solution
We discuss two types algorithms selecting relevant examples developed context computation learning theory The examples selected stream examples generated independently random The first two algorithms socalled boosting algorithms Schapire Schapire Freund Freund QuerybyCommittee algorithm Seung Seung et al We describe algorithms proven properties point commonalities suggest possible future implications
This paper traces development main ideas led present state knowledge Inductive Logic Programming The story begins research psychology subject human concept learning Results research influenced early efforts Artificial Intelligence combined formal methods inductive inference evolve present discipline Inductive Logic Programming Inductive Logic Programming often considered young discipline However roots research dating back nearly years This paper traces development ideas beginning psychology effect concept learning research Artificial Intelligence Independent requirement psychological basis formal methods inductive inference developed These separate streams eventually gave rise Inductive Logic Programming This account entirely unbiased More attention given work researchers influenced interest machine learning Being retrospective paper I attempt describe recent developments ILP This account includes research prior year term Inductive Logic Programming first used Muggleton This reason subtitle A Prehistoric Tale The major headings paper taken names periods evolution life Earth
Recurrent neural networks readily process recognize generate temporal sequences By encoding grammatical strings temporal sequences recurrent neural networks trained behave like deterministic sequential finitestate automata Algorithms developed extracting grammatical rules trained networks Using simple method inserting prior knowledge rules recurrent neural networks show recurrent neural networks able perform rule revision Rule revision performed comparing inserted rules rules finitestate automata extracted trained networks The results training recurrent neural network recognize known nontrivial randomly generated regular grammar show networks preserve correct rules able correct training inserted rules initially incorrect By incorrect mean rules ones randomly generated grammar
In section survey recombination operators apply two parents create offspring Some multiparent recombination operators defined fixed number parents eg arity three operators number parents random number might greater two yet operators arity parameter set arbitrary integer number We pay special attention latter type operators summarize results effect operator arity EA performance
Backpropagation learning algorithms typically collapse networks structure single vector weight parameters optimized We suggest performance may improved utilizing structural information instead discarding introduce framework tempering weight accordingly In tempering model activation error signals treated approximately independent random variables The characteristic scale weight changes matched residuals allowing structural properties nodes fanin fanout affect local learning rate backpropagated error The model also permits calculation upper bound global learning rate batch updates turn leads different update rules bias vs nonbias weights
This paper addresses class learning problems require construction descriptions combine MofN rules traditional Disjunctive Normal form DNF rules The presented method learns descriptions call conditional MofN rules using hypothesisdriven constructive induction approach In approach representation space modified according patterns discovered iteratively generated hypotheses The need MofN rules detected observing exclusiveor equivalence patterns hypotheses These patterns indicate symmetry relations among pairs attributes Symmetrical attributes combined maximal symmetry classes For symmetry class method constructs counting attribute adds new dimension representation space The search hypothesis iteratively modified representation spaces done standard AQ inductive rule learning algorithm It shown proposed method capable solving problems would difficult tackle traditional symbolic learning methods
Inferences conversational casebased reasoning CCBR approach embedded CBR Content Navigator line products susceptible bias case scoring algorithm In particular shorter cases tend given higher scores assuming factors held constant This report summarizes investigation mediating bias We introduce approach eliminating bias evaluate affects retrieval performance six case libraries We also suggest explanations results note limitations study
We investigate effectiveness stochastic hillclimbing baseline evaluating performance genetic algorithms GAs combinatorial function optimizers In particular address four problems GAs applied literature maximum cut problem Kozas multiplexer problem MDAP Multiprocessor Document Allocation Problem jobshop problem We demonstrate simple stochastic hillclimbing methods able achieve results comparable superior obtained GAs designed address four problems We illustrate case jobshop problem insights obtained formulation stochastic hillclimbing algorithm lead improvements encoding used GA fl Department Computer Science University California Berkeley Supported NASA Graduate Fellowship This paper written author visiting researcher Ecole Normale Superieurerue dUlm Groupe de BioInformatique France Email juelscsberkeleyedu Department Mathematics University California Berkeley Supported NDSEG Graduate Fellowship Email wattenbemathberkeleyedu
This paper describes several means sharing related concepts improve learning domain The sharing comes form substructures possibly entire structures previous concepts may aid learning concepts These substructures highlight useful information domain Using two domains evaluate effectiveness concept sharing respect accuracy concept size search complexity noise resistance
Genetic algorithms stochastic search optimization techniques used wide range applications This paper addresses application genetic algorithms graph partitioning problem Standard genetic algorithms large populations suffer lack efficiency quite high execution time A massively parallel genetic algorithm proposed implementation SuperNode Transputers results various benchmarks given A comparative analysis approach hillclimbing algorithms simulated annealing also presented The experimental measures show algorithm gives better results concerning quality solution time needed reach
Support Vector Learning Machines SVM finding application pattern recognition regression estimation operator inversion illposed problems Against general backdrop methods improving generalization performance improving speed test phase SVMs increasing interest In paper combine two techniques pattern recognition problem The method improving generalization performance virtual support vector method incorporating known invariances problem This method achieves drop error rate NIST test digit images The method improving speed reduced set method approximating support vector decision surface We apply method achieve factor fifty speedup test phase virtual support vector machine The combined approach yields machine times faster original machine better generalization performance achieving error The virtual support vector method applicable SVM problem known invariances The reduced set method applicable support vector machine
A significant limitation neural networks representations learn usually incomprehensible humans We present novel algorithm Trepan extracting comprehensible symbolic representations trained neural networks Our algorithm uses queries induce decision tree approximates concept represented given network Our experiments demonstrate Trepan able produce decision trees maintain high level fidelity respective networks comprehensible accurate Unlike previous work area algorithm general applicability scales well large net works problems highdimensional input spaces
By analyzing relationships among chance weight evidence degree belief shown assertion chances special cases belief functions assertion Dempsters rule used combine belief functions based distinct bodies evidence together lead inconsistency DempsterShafer theory To solve problem fundamental postulates theory must rejected A new approach uncertainty management introduced shares many intuitive ideas DS theory avoiding problem
In spite popularity ExplanationBased Learning EBL theoretical basis wellunderstood Using generalization Probably Approximately Correct PAC learning problem solving domains paper formalizes two forms ExplanationBased Learning macrooperators proves sufficient conditions success These two forms EBL called Macro Caching Serial Parsing respectively exhibit two distinct sources power bias sparseness solution space decomposability problemspace The analysis shows exponential speedup achieved either biases suitable domain Somewhat surprisingly also shows computing preconditions macrooperators necessary obtain speedups The theoretical results confirmed experiments domain Eight Puzzle Our work suggests best way address utility problem EBL implement bias exploits problemspace structure set domains one interested learning
Developed recently support vector learning machines achieve high generalization ability minimizing bound expected test error however far existed way adding knowledge invariances classification problem hand We present method incorporating prior knowledge transformation invariances applying transformations support vectors training ex amples critical determining classification boundary
This paper reports recent results using genetic algorithms learn decision rules complex robot behaviors The method involves evaluating hypothetical rule sets simulator applying simulated evolution evolve effective rules The main contributions paper task learned complex behavior involving multiple mobile robots learned rules verified experiments operational mobile robots The case study involves shepherding task one mobile robot attempts guide another robot specified area
In learning systems examples represented fixedlength feature vectors components either real numbers nominal values We propose extension featurevector representation allows value feature set strings instance represent small white black dog nominal features size species setvalued feature color one might use feature vector sizesmall speciescanisfamiliaris colorfwhiteblackg Since make assumptions number possible set elements extension traditional featurevector representation closely connected Blums infinite attribute representation We argue many decision tree rule learning algorithms easily extended setvalued features We also show example many realworld learning problems efficiently naturally represented setvalued features particular text categorization problems problems arise propositionalizing firstorder representations lend setvalued features
The mobile robot domain challenges policyiteration reinforcement learning algorithms difficult problems structural credit assignment uncertainty Structural credit assignment particularly acute domains realtime trial length limiting factor number learning steps physical hardware perform Noisy sensors effectors complex dynamic environments complicate learning problem leading situations speed learning policy flexibility may important policy optimality Input generalization addresses problems typically time consuming robot domains We present two algorithms YBlearning YB perform simple fast generalization input space based bitsimilarity The algorithms trade longterm optimality immediate performance flexibility The algorithms tested simulation nongeneralized learning across different numbers discounting steps YB shown perform better earlier stages learning particularly presence noise In trials performed sonarbased mobile robot subject uncertainty real world YB surpassed simulation results wide margin strongly supporting role quick dirty generalization strategies noisy realtime mobile robot domains
In time series problems noise divided two categories dynamic noise drives process observational noise added measurement process influence future values system In framework empirical volatilities squared relative returns prices exhibit significant amount observational noise To model predict time evolution adequately estimate state space models explicitly include observational noise We obtain relaxation times shocks logarithm volatility ranging three weeks foreign exchange three five months stock indices In cases twodimensional hidden state required yield residuals consistent white noise We compare results ordinary autoregressive models without hidden state find autoregressive models underestimate relaxation times two orders magnitude due ignoring distinction observational dynamic noise This new interpretation dynamics volatility terms relaxators state space model carries stochastic volatility models GARCH models useful several problems finance including risk management pricing derivative securities
In paper present TDLeaf variation TD algorithm enables used conjunction minimax search We present experiments chess program KnightCap used TDLeaf learn evaluation function playing Free Ineternet Chess Server FICS ficsonenetnet It improved rating rating games days play We discuss reasons success also relationship results Tesauros results backgammon
The problem minimizing number misclassified points plane attempting separate two point sets intersecting convex hulls ndimensional real space formulated linear program equilibrium constraints LPEC This general LPEC converted exact penalty problem quadratic objective linear constraints A FrankWolfetype algorithm proposed penalty problem terminates stationary point global solution Novel aspects approach include A linear complementarity formulation step function counts misclassifications ii Exact penalty formulation without boundedness nondegeneracy constraint qualification assumptions iii An exact solution extraction sequence minimizers penalty function finite value penalty parameter general LPEC explicitly exact solution LPEC uncoupled constraints iv A parametric quadratic programming formulation LPEC associated misclassification minimization problem
Technical Report IDSIA Abstract It long known neural networks learn faster input hidden unit activities centered zero recently extended approach also encompass centering error signals Schraudolph Sejnowski Here generalize notion factors involved weight update leading us propose centering slope hidden unit activation functions well Slope centering removes linear component backpropagated error improves credit assignment networks shortcut connections Benchmark results show speed learning significantly without adversely affecting trained networks generalization ability
This paper presents ASOCS Adaptive SelfOrganizing Concurrent System model massively parallel processing incrementally defined rule systems areas adaptive logic robotics logical inference dynamic control An ASOCS adaptive network composed many simple computing elements operating asynchronously parallel An ASOCS operate either data processing mode learning mode During data processing mode ASOCS acts parallel hardware circuit During learning mode ASOCS incorporates rule expressed Boolean conjunction distributed fashion time logarithmic number rules This paper proposes learning algorithm architecture Priority ASOCS This new ASOCS model uses rules priorities The new model significant learning time space complexity improvements previous models Nonvon Neumann architectures neural networks attack wordatatime bottleneck traditional computing systems Neural networks learn inputoutput mappings using highly distributed processing memory Their numerous simple processing elements modifiable weighted links permit high degree parallelism A typical neural network fixed topology It learns modifying weighted links nodes A new class connectionist architectures proposed called ASOCS Adaptive SelfOrganizing Concurrent Systems ASOCS models support efficient computation selforganized learning parallel execution Learning done incremental presentation rules andor examples ASOCS models learn modifying topology Data types include Boolean multistate variables recent models support analog variables The model incorporates rules adaptive logic network parallel self organizing fashion In processing mode ASOCS supports fully parallel execution actual inputs according learned rules The adaptive logic network acts parallel hardware circuit execution mapping n input boolean vectors output boolean vectors combinatoric fashion The overall philosophy ASOCS follows high level goals current neural network models However mechanisms learning execution vary significantly The ASOCS logic network topologically dynamic network growing efficiently fit specific application Current ASOCS models based digital nodes ASOCS also supports use symbolic heuristic learning mechanisms thus combining parallelism distributed nature connectionist computing potential power AI symbolic learning A proof concept ASOCS chip developed
We exhibit theoretically founded algorithm T agnostic PAClearning decision trees levels whose computation time almost linear size training set We evaluate performance learning algorithm T common realworld datasets show datasets T provides simple decision trees little loss predictive power compared C In fact datasets continuous attributes error rate tends lower C To best knowledge first time PAClearning algorithm shown applicable realworld classification problems Since one prove T agnostic PAClearning algorithm T guaranteed produce close optimal level decision trees sufficiently large training sets distribution data In regard T differs strongly learning algorithms considered applied machine learning guarantee given performance new datasets We also demonstrate algorithm T used diagnostic tool investigation expressive limits level decision trees Finally T combination new bounds VCdimension decision trees bounded depth derive provides us first time tools necessary comparing learning curves decision trees realworld datasets theoretical estimates PAC learning theory
Andrew D Back Department Electrical Computer Engineering University Queensland St Lucia Australia He Brain Information Processing Group Frontier Research Program RIKEN The Institute Physical Chemical Research Hirosawa Wakoshi Saitama Japan Abstract The performance neural network simulations often reported terms mean standard deviation number simulations performed different starting conditions However many cases distribution individual results approximate Gaussian distribution may symmetric may multimodal We present distribution results practical problems show assuming Gaussian distributions significantly affect interpretation results especially comparison studies For controlled task consider find distribution performance skewed towards better performance smoother target functions skewed towards worse performance
The structure environment affects behaviors organisms evolved How structure described behavioral consequences explained predicted We aim establish initial answers questions simulating evolution simple organisms simple environments different structures Our artificial creatures called minimats neither sensors memory behave solely picking amongst actions moving eating reproducing sitting according inherited probability distribution Our simulated environments contain food multiple minimats structured terms spatial temporal food density patchiness food appears Changes environmental parameters affect evolved behaviors minimats different ways three parameters importance describing minimat world One useful behavioral strategies evolves looping movement allows minimatsdespite lack internal stateto match behavior temporal spatial structure environment Ultimately find minimats construct environments individual behaviors making study impact global environment structure individual behavior much complex
The primary aim paper show graphical models used mathematical language integrating statistical subjectmatter information In particular paper develops principled nonparametric framework causal inference diagrams queried determine assumptions available sufficient identifying causal effects nonexperimental data If diagrams queried produce mathematical expressions causal effects terms observed distributions otherwise diagrams queried suggest additional observations auxiliary experiments desired inferences obtained Key words Causal inference graph models structural equations treatment effect
We analyse biases eleven measures estimating quality multivalued attributes The values information gain Jmeasure giniindex relevance tend linearly increase number values attribute The values gainratio distance measure Relief weight evidence decrease informative attributes increase irrelevant attributes The bias statistic tests based chisquare distribution similar functions able discriminate among attributes different quality We also introduce new function based MDL principle whose value slightly decreases increasing number attributes values
In past nearest neighbor algorithms learning examples worked best domains features numeric values In domains examples treated points distance metrics use standard definitions In symbolic domains sophisticated treatment feature space required We introduce nearest neighbor algorithm learning domains symbolic features Our algorithm calculates distance tables allow produce realvalued distances instances attaches weights instances modify structure feature space We show technique produces excellent classification accuracy three problems studied machine learning researchers predicting protein secondary structure identifying DNA promoter sequences pronouncing English text Direct experimental comparisons learning algorithms show nearest neighbor algorithm comparable superior three domains In addition algorithm advantages training speed simplicity perspicuity We conclude experimental evidence favors use continued development nearest neighbor algorithms domains ones studied
Many supervised machine learning algorithms require discrete feature space In paper review previous work continuous feature discretization identify defining characteristics methods conduct empirical evaluation several methods We compare binning unsupervised discretization method entropybased puritybased methods supervised algorithms We found performance NaiveBayes algorithm significantly improved features discretized using entropybased method In fact tested datasets discretized version NaiveBayes slightly outperformed C average We also show cases performance C induction algorithm significantly improved features discretized advance experiments performance never significantly degraded interesting phenomenon considering fact C capable locally discretiz ing features
We review recent work done group applying genetic algorithms GAs design cellular automata CAs perform computations requiring global coordination A GA used evolve CAs two computational tasks density classification synchronization In cases GA discovered rules gave rise sophisticated emergent computational strategies These strategies analyzed using computational mechanics framework particles carry information interactions particles effects information processing This framework also used explain process strategies designed GA The work described first step employing GAs engineer useful emergent computation decentralized multiprocessor systems It also first step understanding evolutionary process produce complex systems sophisticated collective computational abilities
Metastability common phenomenon Many evolutionary processes natural artificial alternate periods stasis brief periods rapid change behavior In paper analytical model dynamics mutationonly genetic algorithm GA introduced identifies new general mechanism causing metastability evolutionary dynamics The GAs population dynamics described terms flows space fitness distributions The trajectories fitness distribution space derived closed form limit infinite populations We show finite populations induce metastability even regions fitness exhibit local optimum In particular model predicts occurrence fitness epochs periods stasis population fitness distributionsat finite population size identifies locations fitness epochs flows hyperbolic fixed points This enables exact predictions metastable fitness distributions fitness epochs well giving insight nature periods stasis innovations All results obtained closedform expressions terms GAs parameters An analysis Jacobian matrices neighborhood epochs metastable fitness distribution allows calculation stable unstable manifold dimensions reveals state spaces topological structure More general quantitative features dynamicsfitness fluctuation amplitudes epoch stability speed innovationsare also determined Jacobian eigenvalues The analysis shows quantitative predictions range dynamical behaviors specific finite population dynamics derived solution infinite population dynamics The theoretical predictions shown agree well statistics GA simulations We also discuss connections results population genetics molecular evolution theory
In paper explore use adaptive search technique genetic algorithms construct system GABIL continually learns refines concept classification rules interaction environment The performance system measured set concept learning problems compared performance two existing systems IDR C Preliminary results support despite minimal system bias GABIL effective concept learner quite competitive IDR C target concept increases complexity
We review accuracy estimation methods compare two common methods crossvalidation bootstrap Recent experimental results artificial data theoretical results restricted settings shown selecting good classifier set classifiers model selection tenfold crossvalidation may better expensive leaveoneout crossvalidation We report largescale experimentover half million runs C NaiveBayes algorithmto estimate effects different parameters algorithms realworld datasets For crossvalidation vary number folds whether folds stratified bootstrap vary number bootstrap samples Our results indicate realword datasets similar best method use model selection tenfold stratified cross validation even computation power allows using folds
NaiveBayes induction algorithms previously shown surprisingly accurate many classification tasks even conditional independence assumption based violated However studies done small databases We show larger databases accuracy NaiveBayes scale well decision trees We propose new algorithm NBTree induces hybrid decisiontree classifiers NaiveBayes classifiers decisiontree nodes contain univariate splits regular decisiontrees leaves contain NaiveBayesian classifiers The approach retains interpretability NaiveBayes decision trees resulting classifiers frequently outperform constituents especially larger databases tested
We present MLC library C classes tools supervised Machine Learning While MLC provides general learning algorithms used end users main objective provide researchers experts wide variety tools accelerate algorithm development increase software reliability provide comparison tools display information visually More collection existing algorithms MLC attempt extract commonalities algorithms decompose unified view simple coherent extensible In paper discuss problems MLC aims solve design MLC current functionality
Bayesian models involving Dirichlet process mixtures heart modern nonparametric Bayesian movement Much rapid development models last decade direct result advances simulationbased computational methods Some early work area circa focused use nonparametric ideas models applications otherwise standard hierarchical models This chapter provides historical review perspective developments prime focus use integration nonparametric ideas hierarchical models We illustrate ease strict parametric assumptions common standard Bayesian hierarchical models relaxed incorporate uncertainties functional forms using Dirichlet process components partly enabled approach computation using MCMC methods The resulting methology illustrated two examples taken unpublished report topic
In paper present averagecase analysis Bayesian classifier simple induction algorithm fares remarkably well many learning tasks Our analysis assumes monotone conjunctive target concept independent noisefree Boolean attributes We calculate probability algorithm induce arbitrary pair concept descriptions use compute probability correct classification instance space The analysis takes account number training instances number attributes distribution attributes level class noise We also explore behavioral implications analysis presenting predicted learning curves artificial domains give experimental results domains check reasoning One goal research machine learning discover principles relate algorithms domain characteristics behavior To end many researchers carried systematic experimentation natural artificial domains search empirical regularities eg Kibler Langley Others focused theoretical analyses often within paradigm probably approximately correct learning eg Haussler However experimental studies based informal analyses learning task whereas formal analyses address worst case thus bear little relation empirical results ber attributes class attribute frequencies obtain predictions behavior induction algorithms used experiments check analyses However research focus algorithms typically used experimental practical sides machine learning important averagecase analyses extended methods Recently growing interest probabilistic approaches inductive learning For example Fisher described Cobweb incremental algorithm conceptual clustering draws heavily Bayesian ideas literature reports number systems build work eg Allen Langley Iba Gennari Thompson Langley Cheeseman et al outlined AutoClass nonincremental system uses Bayesian methods cluster instances groups researchers focused induction Bayesian inference networks eg Cooper Kerskovits These recent Bayesian learning algorithms complex easily amenable analysis share common ancestor simpler tractable This supervised algorithm refer simply Bayesian classifier comes originally work pattern recognition Duda Hart The method stores probabilistic summary class summary contains conditional probability attribute value given class well probability base rate class This data structure approximates representational power perceptron describes single decision boundary instance space When algorithm encounters new instance updates probabilities stored specified class Neither order training instances occurrence classification errors effect process When given test instance classifier uses evaluation function describe detail later rank alter
Regularization eg form weight decay important training optimization neural network architectures In work provide tool based asymptotic sampling theory iterative estimation weight decay parameters The basic idea gradient descent estimated generalization error respect regularization parameters The scheme implemented Designer Net framework network training pruning ie based diagonal Hessian approximation The scheme require essential computational overhead addition needed training pruning The viability approach demonstrated experiment concerning prediction chaotic MackeyGlass series We find optimized weight decays relatively large densely connected networks initial pruning phase decrease pruning proceeds
vations perceptrons perceptron learning algorithm cycles among hyperplanes hyperplanes may compared select one gives best split examples always possible perceptron build hyper plane separates least one example rest We describe Extentron grows multilayer networks capable distinguishing non linearlyseparable data using simple perceptron rule linear threshold units The resulting algorithm simple fast scales well large prob lems retains convergence properties perceptron completely specified using two parameters Results presented comparing Extentron neural network paradigms symbolic learning systems
Technical Report IDSIA Abstract It long known neural networks learn faster input hidden unit activities centered zero recently extended approach also encompass centering error signals Here generalize notion factors involved networks gradient leading us propose centering slope hidden unit activation functions well Slope centering removes linear component backpropagated error improves credit assignment networks shortcut connections Benchmark results show speed learning significantly without adversely affecting trained networks generalization ability
We introduce new faulttolerant model algorithmic learning using equivalence oracle incomplete membership oracle answers random subset learners membership queries may missing We demonstrate high probability still possible learn monotone DNF formulas polynomial time provided fraction missing answers bounded constant less one Even half membership queries expected yield information algorithm exactly identify mterm nvariable monotone DNF formulas expected Omn queries The task shown require exponential time using equivalence queries alone We extend algorithm handle onesided errors discuss several possible error models It hoped work may lead better understanding power membership queries effects faulty teachers query models concept learning
One method making analogies access instantiate abstract domain principles one method acquiring knowledge abstract principles discover experience We view generalization experiences absence prior knowledge target principle task hypothesis formation subtask discovery Also view use hypothesized principles analogical design task hypothesis testing another subtask discovery In paper focus discovery physical principles generalization design experiences domain physical devices Some important issues generalization experiences generalize experience far generalize methods use We represent reasoners comprehension specific designs form structurebehaviorfunction SBF models An SBF model provides functional causal explanation working device We represent domain principles deviceindependent behaviorfunction BF models We show function device determines generalize SBF model ii SBF model suggests far generalize iii typology functions indicates method use
The power casebased method comes ability retrieve right case new problem specified This implies learning right indices case storing potential reuse crucial success method A hierarchical organization case memory raises two distinct related issues index learning learning indexing vocabulary learning right level generalization In paper show use structurebehaviorfunction SBF models constrains index learning context experiencebased design physical devices The SBF model design provides functional causal explanation structure design delivers function We describe SBF model design together specification task design case might reused provides vocabulary indexing design case memory We also discuss prior design experiences stored casememory help determine level index generalization The KRITIK system implements evaluates modelbased method learning indices design cases
Yang Y HJ Sussmann ED Sontag Stabilization linear systems bounded controls Proc Nonlinear Control Systems Design Symp Bordeaux June M Fliess Ed IFAC Publications pp Journal version appear IEEE Trans Autom Control
The Bayesian approach comparing models involves calculating posterior probability plausible model For highdimensional contingency tables set plausible models large We focus attention reversible jump Markov chain Monte Carlo Green develop strategies calculating posterior probabilities hierarchical graphical decomposable loglinear models Even tables moderate size sets models may large The choice suitable prior distributions model parameters also discussed detail two examples presented For first example fi fi table model probabilities calculated using reversible jump approach compared model probabilities calculated exactly using alternative approximation The second example contingency table exact methods infeasible due large number possible models
In addition learning new knowledge system must able learn knowledge likely applicable An index piece information identified given situation triggers relevant piece knowledge schema systems memory We discuss issue indices may learned automatically context story understanding task present program learn new indices existing explanatory schemas We discuss two methods using system identify relevant schema even input directly match existing index learn new index allow retrieve schema efficiently future
In paper construct suboptimal H controllers satisfy new robust performance condition using receding horizon technique A method described synthesis H controllers online making use exact plant model finite interval extending future Inequalities based two Riccati differential equation solution finite horizon H problem derived resulting freedom exploited construct H controllers closed loop induced norm less prespecified value plants within set described terms future variation plant Dual results possible adaptive interpretation also constructed
Concept learning one studied areas machine learning A lot work domain deals decision trees In paper concerned different kind technique based Galois lattices concept lattices We present new semilattice based system IGLUE uses entropy function topdown approach select concepts lattice construction Then IGLUE generates new relevant numerical features transforming initial boolean features concepts IGLUE uses new features redescribe examples Finally IGLUE applies Mahanalobis distance similarity measure examples Keywords Multistrategy Learning InstanceBased Learning Galois lattice Feature transformation
This paper introduces hybrid learning methodology integrates genetic algorithms GAs decision tree learning ID order evolve optimal subsets discriminatory features robust pattern classification A GA used search space possible subsets large set candidate discrimination features For given feature subset ID invoked produce decision tree The classification performance decision tree unseen data used measure fitness given feature set turn used GA evolve better feature sets This GAID process iterates feature subset found satisfactory classification performance Experimental results presented illustrate feasibility approach difficult problems involving recognizing visual concepts satellite facial image data The results also show improved classification performance reduced description complexity compared standard methods feature selection
This paper presents neural network architecture manage structured data refine knowledge bases expressed first order logic language The presented framework well suited classification problems concept de scriptions depend upon numerical features data In fact main goal neural architecture refining numerical part knowledge base without changing structure In particular discuss method translate set classification rules neural computation units Here focus attention translation method algorithms refine network weights struc tured data The classification theory refined manually handcrafted automatically acquired symbolic relational learning system able deal numerical features As matter fact primary goal bring neural network architecture capability dealing structured data unrestricted size allowing dynamically bind classification rules different items occur ring input data An extensive experimentation challenging artificial case study shows network converges quite fastly generalizes much better propositional learners equivalent task definition
The evolving population neural nets contains information terms genes also collection behaviors population members Such information thought kind culture population Two ways exploiting culture explored paper Culling overlarge litters Generate large number offspring different crossovers quickly evaluate comparing performance population throw away appear poor Teaching Use backpropagation train offspring toward performance population Both techniques result faster effective neuroevolution effectively combined demonstrated inverted pendulum problem Additional methods cultural exploitation possible studied future work These results suggest cultural exploitation powerful idea allows leveraging several aspects genetic algorithm
This paper describes StructureMapping Engine SME program studying analogical processing SME built explore Gentners Structuremapping theory analogy provides tool kit constructing matching algorithms consistent theory Its flexibility enhances cognitive simulation studies simplifying experimentation Furthermore SME efficient making useful component machine learning systems well We review Structuremapping theory describe design engine We analyze complexity algorithm demonstrate steps polynomial typically bounded O N Next demonstrate examples operation taken cognitive simulation studies work machine learning Finally compare SME analogy programs discuss several areas future work This paper appeared Artificial Intelligence pp For information please contact forbusilsnwuedu
We investigate aspects cognition involved invention precisely invention telephone Alexander Graham Bell We propose use StructureBehaviorFunction SBF language representation invention knowledge claim SBF shown support wide range reasoning physical devices constitutes plausible account inventor might represent knowledge invention We propose use ACTR architecture implementation model ACTR shown precisely model wide range human cognition We draw upon architecture execution productions matching declarative knowledge spreading activation Thus present model combines wellestablished cognitive validity ACTR powerful specialized modelbased reasoning methods facilitated SBF
In online character recognition observe two kinds intraclass variations small geometric deformations completely different writing styles We propose new approach deal problems defining extension tangent distance well known offline character recognition The system implemented knearest neighbor classifier called diabolo classifier respectively Both classifiers invariant transformations like rotation scale slope deal variations stroke order writing direction Results presented digit database writers
Exact Boltzmann learning done certain restricted networks technique decimation We enlarged set decimatable Boltzmann machines introducing new general decimation rule We compared solutions probability density estimation problem decimatable Boltzmann machines results obtained Gibbs sampling unrestricted nondecimatable
The majority results computational learning theory concerned concept learning ie special case function learning classes functions range f g Much less known theory learning functions larger range IN IR In particular relatively results exist general structure common models function learning nontrivial function classes positive learning results exhibited models We introduce paper notion binary branching adversary tree function learning allows us give somewhat surprising equivalent characterization optimal learning cost learning class realvalued functions terms maxmin definition involve learning model Another general structural result paper relates cost learning union function classes learning costs individual function classes Furthermore exhibit efficient learning algorithm learning convex piecewise linear functions IR IR Previously class linear functions IR IR class functions multidimensional domain known learnable within rigorous framework formal model online learning Finally give sufficient condition arbitrary class F functions IR IR allows us learn class functions written pointwise maximum k functions F This allows us exhibit number nontrivial classes functions IR IR exist efficient learning algorithms
Although applicable wide array problems demonstrated good performance number difficult realworld tasks neural networks usually applied problems comprehensibility acquired concepts important The concept representations formed neural networks hard understand typically involve distributed nonlinear relationships encoded large number realvalued parameters To address limitation developing algorithms extracting symbolic concept representations trained neural networks We first discuss important able understand concept representations formed neural networks We briefly describe approach discuss number issues pertaining comprehensibility arisen work Finally discuss choices made research date open research issues yet addressed
One view computational learning theory learner acquiring knowledge teacher We introduce formal model learning capturing idea teachers may gaps knowledge In particular consider learning teacher labels examples positive instance concept learned negative instance concept learned instance unknown classification way knowledge concept class positive negative examples sufficient determine labelling examples labelled The goal learner compensate ignorance teacher attempting infer labels examples labelled rather learn approximation ternary labelling presented teacher Thus goal learner still acquire knowledge teacher learner must also identify gaps This notion learning consistently ignorant teacher We present general results describing known learning algorithms used obtain algorithms learn consistently ignorant teacher We investigate learnability variety concept classes model including monomials monotone DNF formulas Horn sentences decision trees DFAs axisparallel boxes Euclidean space among others Both learnability nonlearnability results presented
In bagging Brea one uses bootstrap replicates training set Efr ET try improve learning algorithms performance The computational requirements estimating resultant generalization error test set means crossvalidation often prohibitive leaveoneout crossvalidation one needs train underlying algorithm order times size training set number replicates This paper presents several techniques exploiting biasvariance decomposition GBD Wol estimate generalization error bagged learning algorithm without invoking yet training underlying learning algorithm The best estimators exploits stacking Wol In set experiments reported found accurate alternative crossvalidationbased estimator bagged algorithms error crossvalidationbased estimator underlying algorithms error This improvement particularly pronounced small test sets This suggests novel justification using bagging im proved estimation generalization error
This paper presents algorithm discovery building blocks genetic programming GP called adaptive representation learning ARL The central idea ARL adaptation problem representation extending set terminals functions set evolvable subroutines The set subroutines extracts common knowledge emerging evolutionary process acquires necessary structure solving problem ARL supports subroutine creation deletion Subroutine creation discovery performed automatically based differential parentoffspring fitness block activation Subroutine deletion relies utility measure similar schema fitness window past generations The technique described tested problem controlling agent dynamic nondeterministic environment The automatic discovery subroutines help scale GP technique complex problems
In paper describe new technique exactly identifying certain classes readonce Boolean formulas The method based sampling inputoutput behavior target formula probability distribution determined fixed point formulas amplification function defined probability output formula input bit independently probability p By performing various statistical tests easily sampled variants fixedpoint distribution able efficiently infer structural information logarithmicdepth formula high probability We apply results prove existence short universal identification sequences large classes formulas We also describe extensions algorithms handle high rates noise learn formulas unbounded depth Valiants model respect specific distributions fl Most research carried three authors MIT Laboratory Computer Science Support provided NSF Grant CCR ARO Grant DAALK DARPA Contract NJ grant Siemens Corporation An extended abstract paper appeared proceedings st Annual Symposium Foundations Computer Science Supported part GE Foundation Junior Faculty Grant z Supported AFOSR Grant AFOSR
We consider problem learning kterm DNF formulas using equivalence queries incomplete membership queries defined Angluin Slonim We demonstrate model applied nonmonotone classes Namely describe polynomialtime algorithm exactly identifies kterm DNF formula kterm DNF hypothesis using incomplete membership queries equivalence queries class DNF formulas
Most Artificial Neural Networks ANNs fixed topology learning typically suffer number shortcomings result Variations ANNs use dynamic topologies shown ability overcome many problems This paper introduces LocationIndependent Transformations LITs general strategy parallel implementation feedforward networks use dynamic topologies A LIT creates set locationindependent nodes node computes part network output independent nodes using local information This type transformation allows efficient support adding deleting nodes dynamically learning This paper deals specifically LITs localist ANNslocalist sense ultimately one node responsible output In particular paper presents LITs two ANNs singlelayer competitive learning network b counterpropagation network combines elements supervised learning competitive learning The complexity learning execution algorithms ANNs linear number inputs logarithmic number nodes original network
We present new method obtaining local error bars nonlinear regression ie estimates confidence predicted values depend input We approach problem applying maximumlikelihood framework assumed distribution errors We demonstrate method first computergenerated data locally varying normally distributed target noise We apply laser data Santa Fe Time Series Competition underlying system noise known quantization error error bars give local estimates model misspecification In cases method also provides weightedregression effect improves generalization performance
Introspective reasoning systems reasoning processes form basis learning refine reasoning processes The ROBBIE system uses introspective reasoning monitor retrieval process casebased planner detect retrieval inappropriate cases When retrieval problems detected source problems explained explanations used determine new indices use future case retrieval The goal ROBBIEs learning increase ability focus retrieval relevant cases aim simultaneously decreasing number candidates consider increasing likelihood system able successfully adapt retrieved cases fit current situation We evaluate benefits approach light empirical results examining effects index learning
Inductive learning algorithms try obtain knowledge system set examples One difficult problems machine learning consists getting structure knowledge We propose algorithm able manage fuzzy information able learn structure rules represent system The algorithm gives reasonable small set fuzzy rules represent original set examples
Since consider theory refinement TR possible key concept methodologically clear view knowledgebase maintenance try give structured overview actual stateoftheart TR This overview arranged along description TR search problem We explain basic approach show variety existing systems try give hints direction future research go
The problem protecting computer systems viewed generally problem learning distinguish self We describe method change detection based generation T cells immune system Mathematical analysis reveals computational costs system preliminary experiments illustrate method might applied problem computer viruses
Markov Chain Monte Carlo MCMC methods introduced Gelfand Smith provide simulation based strategy statistical inference The application fields related methods well theoretical convergence properties intensively studied recent literature However many improvements still expected provide workable theoretically wellgrounded solutions problem monitoring convergence actual outputs MCMC algorithms ie convergence assessment problem In paper introduce discuss methodology based Central Limit Theorem Markov chains assess convergence MCMC algorithms Instead searching approximate stationarity primarily intend control precision estimates invariant probability measure integrals functions respect measure confidence regions based normal approximation The first proposed control method tests normality hypothesis normalized averages functions Markov chain independent parallel chains This normality control provides good guarantees whole state space explored even multimodal situations It lead automated stopping rules A second tool connected normality control based graphical monitoring stabilization variance n iterations near limiting variance appearing CLT Both methods require knowledge sampler driving chain In paper mainly focus finite state Markov chains since setting allows us derive consistent estimates limiting variance variance n iterations Heuristic procedures based BerryEsseen bounds investigated An extension continuous case also proposed Numerical simulations illustrating performance methods given several examples finite chain multimodal invariant probability finite state random walk theoretical rate convergence stationarity known continuous state chain multimodal invariant probability issued Gibbs sampler
This paper explores application Temporal Difference TD learning Sutton forecasting behavior dynamical systems realvalued outputs opposed gamelike situations The performance TD learning comparison standard supervised learning depends amount noise present data In paper use deterministic chaotic time series lownoise laser For task direct fivestep ahead predictions experiments show standard supervised learning better TD learning The TD algorithm viewed linking adjacent predictions A similar effect obtained sharing internal representation network We thus compare two architectures paradigms first architecture separate hidden units consists individual networks five direct multistep prediction tasks second shared hidden units single larger hidden layer finds representation five predictions next five steps generated For data set find significant difference two architectures fl httpwwwcscoloradoeduandreasHomehtml This paper available ftpftpcscoloradoedupubTimeSeriesMyPaperskazlasweigend nipspsZ
We describe wakesleep algorithm allows multilayer unsupervised neural network build hierarchy representations sensory input The network bottomup recognition connections used convert sensory input underlying representations Unlike artificial neural networks also topdown generative connections used reconstruct sensory input representations In wake phase learning algorithm network driven bottomup recognition connections topdown generative connections trained better reconstructing sensory input representation chosen recognition process In sleep phase network driven topdown generative connections produce fantasized representation fantasized sensory input The recognition connections trained better recovering fantasized representation fantasized sensory input In phases synaptic learning rule simple local The combined effect two phases create representations sensory input efficient following sense On average takes bits describe sensory input vector directly first describe representation sensory input chosen recognition process describe difference sensory input reconstruction chosen representation
Technical Report CRGTR Department Computer Science University Toronto Kings College Road Toronto Canada MS A Abstract Bayesian inference begins prior distribution model parameters meant capture prior beliefs relationship modeled For multilayer perceptron networks parameters connection weights prior lacks direct meaning matters prior functions computed network implied prior weights In paper I show priors weights defined way corresponding priors functions reach reasonable limits number hidden units network goes infinity When using priors thus need limit size network order avoid overfitting The infinite network limit also provides insight properties different priors A Gaussian prior hiddentooutput weights results Gaussian process prior functions smooth Brownian fractional Brownian depending hidden unit activation function prior inputtohidden weights Quite different effects obtained using priors based nonGaussian stable distributions In networks one hidden layer combination Gaussian nonGaussian priors appears interesting
We present new algorithms reinforcement learning prove polynomial bounds resources required achieve nearoptimal return general Markov decision processes After observing number actions required approach optimal return lower bounded mixing time T optimal policy undiscounted case horizon time T discounted case give algorithms requiring number actions total computation time polynomial T number states undiscounted discounted cases An interesting aspect algorithms explicit handling ExplorationExploitation tradeoff These first results reinforcement learning literature giving algorithms provably converge nearoptimal performance polynomial time general Markov decision processes
A basic premise casebased reasoning CBR involves reasoning cases representations real episodes rather rules facts structures stated connection real episodes In fact CBR systems reason directly cases rather reason abstractions simplifications cases In paper argue pure casebased reasoning ie reasoning representations concrete reasonably complete We claim working representations satisfy criteria We illustrate argument examples three previous systems chef swale hypo well cookie CBR system developed first author
To appear G Tesauro D S Touretzky T K Leen eds Advances Neural Information Processing Systems MIT Press Cambridge MA A straightforward approach curse dimensionality reinforcement learning dynamic programming replace lookup table generalizing function approximator neural net Although successful domain backgammon guarantee convergence In paper show combination dynamic programming function approximation robust even benign cases may produce entirely wrong policy We introduce GrowSupport new algorithm safe divergence yet still reap benefits successful generalization
An exact model simple genetic algorithm developed permutation based representations Permutation based representations used scheduling problems combinatorial problems Traveling Salesman Problem A remapping function developed remap model permutations search space The mixing matrices various permutation based operators also developed
Test functions commonly used evaluate effectiveness different search algorithms However results evaluation dependent test problems algorithms subject comparison Unfortunately developing test suite evaluating competing search algorithms difficult without clearly defined evaluation goals In paper discuss basic principles used develop test suites examine role test suites used evaluate evolutionary search algorithms Current test suites include functions easily solved simple search methods greedy hillclimbers Some test functions also undesirable characteristics exaggerated dimensionality search space increased New methods examined constructing functions different degrees nonlinearity interactions cost evaluation scale respect dimensionality search space
Source separation arises surprising number signal processing applications speech recognition EEG analysis In square linear blind source separation problem without time delays one must find unmixing matrix detangle result mixing n unknown independent sources unknown n fi n mixing matrix The recently introduced ICA blind source separation algorithm Baram Roth Bell Sejnowski powerful surprisingly simple technique solving problem ICA remarkable performing well despite making absolutely use temporal structure input This paper presents new algorithm contextual ICA derives maximum likelihood density estimation formulation problem cICA incorporate arbitrarily complex adaptive historysensitive source models thereby make use temporal structure input This allows separate number situations standard ICA including sources low kurtosis colored gaussian sources sources gaussian histograms Since ICA special case cICA MLE derivation provides corollary rigorous derivation classic ICA
We inv estigate applicability adaptive neural network problems timedependent input demonstrating deterministic parser natural language inputs significant syntactic complexity developed using recurrent connectionist architectures The traditional stacking mechanism known necessary proper treatment contextfree languages symbolic systems absent design subsumed recurrency network
In paper address problem constructing reliable neuralnet implementations given assumption particular implementation totally correct The approach taken paper organize inevitable errors minimize impact context multiversion system ie system functionality reproduced multiple versions together constitute neuralnet system The unique characteristics neural computing exploited order engineer reliable systems form diverse multiversion systems used together decision strategy majority vote Theoretical notions methodological diversity contributing improvement system performance implemented tested An important aspect engineering optimal system overproduce components choose optimal subset Three general techniques choosing final system components implemented evaluated Several different approaches effective engineering complex multiversion systems designs realized evaluated determine overall reliability well reliability overall system comparison lesser reliability component substructures
Littlewood Miller present statistical framework dealing coincident failures multiversion software systems They develop theoretical model holds promise high system reliability use multiple diverse sets alternative versions In paper adapt framework investigate feasibility exploiting diversity observable multiple populations neural networks developed using diverse methodologies We evaluate generalisation improvements achieved range methodologically diverse network generation processes We attempt order constituent methodological features respect potential use engineering useful diversity We also define explore use relative measures diversity version sets guide potential exploiting interset diversity
During development lifecycle knowledgebased systems requirements system knowledge system change One types knowledge affected changing requirements controlknowledge prescribes ordering problemsolving steps Machinelearning aid developers knowledgebased systems adapting systems changing requirements A number machinelearning techniques learning controlknowledge applied problemsolvers ProdigyEBL LEX In knowledge engineering focus shifted construction knowledgelevel models problemsolving instead directly constructing knowledgebased system problemsolver In paper describe work progress apply machine learning techniques KADS model expertise
Results Abbadingo One DFA Learning Abstract This paper first describes structure results Abbadingo One DFA Learning Competition The competition designed encourage work algorithms scale wellboth larger DFAs sparser training data We describe discuss winning algorithm Rodney Price orders state merges according amount evidence favor A second winning algorithm Hugues Juille described separate paper
M Abeles H Bergman E Vaadia School Medicine Center Neural Computation Hebrew University POB Jerusalem Israel E Seidemann I Meilijson School Mathematical Sciences Raymond Beverly Sackler Faculty Exact Sciences School Medicine Tel Aviv University Tel Aviv Israel I Gat N Tishby Institute Computer Science Center Neural Computation Hebrew University Jerusalem Israel
In work present new bottomup algorithm decision tree pruning efficient requiring single pass given tree prove strong performance guarantee generalization error resulting pruned tree We work typical setting given tree T may derived given training sample S thus may badly overfit S In setting give bounds amount additional generalization error pruning suffers compared optimal pruning T More generally results show pruning T small error whose size small compared jSj algorithm find pruning whose error much larger This style result called index resolvability result Barron Cover context density estimation A novel feature algorithm locality decision prune subtree based entirely properties subtree sample reaching To analyze algorithm develop tools local uniform convergence generalization standard notion may prove useful settings
We investigate application Support Vector Machines SVMs computer vision SVM learning technique developed V Vapnik team ATT Bell Labs seen new method training polynomial neural network Radial Basis Functions classifiers The decision surfaces found solving linearly constrained quadratic programming problem This optimization problem challenging quadratic form completely dense memory requirements grow square number data points We present decomposition algorithm guarantees global optimality used train SVMs large data sets The main idea behind decomposition iterative solution subproblems evaluation optimality conditions used generate improved iterative values also establish stopping criteria algorithm We present experimental results implementation SVM demonstrate feasibility approach face detection problem involves data set data points
One open problems listed Rivest Schapire whether copies L fl algorithm combined one better performance This paper describes algorithm called D fl combination The idea represent states learned model using observable symbols well hidden symbols constructed learning These hidden symbols created reflect distinct behaviors model states The distinct behaviors represented local distinguishing experiments LDEs confused global distinguishing sequences LDEs created learners prediction mismatches actual observation unknown machine To synchronize model environment LDEs also concatenated form homing sequence It shown D fl learn probability model approximation unknown machine number actions polynomial size environment
All models natural systems represent abstraction simplification system Thus models suffer validation problem Should believe results model bearing reality This particularly acute problem Alife models evolution ecosystems The time scale evolution complexity ecosystems make controlled experiments difficult If Alife ever contribute significantly biology must find methods build confidence models One alternative experimental tests model validate previously verified theory I applied series ecological evolutionary validation tests model species diversification Examination predatorprey dynamics trophic cascades competitive exclusion adaptation speciesarea curve model shown course grained spatial structure inadequate capture realistic dynamics ecosystem Only spatial structure extended local patch dynamics model begin behave realistically wide range parameters Validation ecological dynamics model provides indirect support evolutionary behavior species within ecosystem Every model abstraction simplification The goal model capture essence system real world behavior model matches behavior real system Thus model may ask valid representation real system Answering question problem validation Traditionally try disprove validity model collecting data real system comparing predictions model In artificial life rarely luxury Artificial life models tend highly abstract general field striving discover general properties life This makes experimental validation extremely difficult The time scale evolution tends restrict experiments observation fossil record Benton example manipulation organisms extremely short lifecycles simplified environments Krukonis example Similarly complexity size ecosystems makes ecological experiments cumbersome difficult control An alternative form validation pursued indirectly reference ecological evolutionary theory Instead asking model matches experimental data ask model matches understanding dynamics ecology evolution Then extent theories ecology evolution validated experimental observations disprove validity model fails match theories What follows example technique applied model designed examine factors impact origin maintenance species diversity While purpose model explore new theoretical ground biology ecological evolutionary dynamics model validated theories predation competition adaptation island biogeography Hraber Milne looked genotype diversity presence absence selection varying mutation rates ECHO model Holland Mirroring Bedau et als results found genotypic diversity greatest
One issues evolutionary algorithms EAs relative importance two search operators mutation crossover Genetic algorithms GAs genetic programming GP stress role crossover evolutionary programming EP evolution strategies ESs stress role mutation The existence many different forms crossover complicates issue Despite theoretical analysis appears difficult decide priori form crossover use even crossover used One possible solution difficulty EA selfadaptive ie EA dynamically modify forms crossover use often use solves problem This paper describes adaptive mechanism controlling use crossover EA explores behavior mechanism number different situations An improvement adaptive mechanism presented Surprisingly improvement also used enhance performance nonadaptive EA
Graphical techniques modeling dependencies random variables explored variety different areas including statistics statistical physics artificial intelligence speech recognition image processing genetics Formalisms manipulating models developed relatively independently research communities In paper explore hidden Markov models HMMs related structures within general framework probabilistic independence networks PINs The paper contains selfcontained review basic principles PINs It shown wellknown forwardbackward FB Viterbi algorithms HMMs special cases general inference algorithms arbitrary PINs Furthermore existence inference estimation algorithms general graphical models provides set analysis tools HMM practitioners wish explore richer class HMM structures Examples relatively complex models handle sensor fusion coarticulation speech recognition introduced treated within graphical model framework illustrate advantages general approach This report describes research done Department Information Computer Science University California Irvine Jet Propulsion Laboratory California Institute Technology Microsoft Research Center Biological Computational Learning Artificial Intelligence Laboratory Massachusetts Institute Technology The authors contacted pjsaigjplnasagov heckermamicrosoftcom jordanpsychemitedu Support CBCL provided part grant NSF ASC Support laboratorys artificial intelligence research provided part Advanced Research Projects Agency Dept Defense MIJ gratefully acknowledges discussions Steffen Lauritzen application IPF algorithm UPINs
Perceptron learning randomly labeled patterns analyzed using Gibbs distribution set realizable labelings patterns The entropy distribution extension VapnikChervonenkis VC entropy reducing exactly limit infinite temperature The close relationship VC Gardner entropies seen within replica formalism There recent progress towards understanding relationship statistical physics VapnikChervonenkis VC approaches learning theory The two approaches unified statistical mechanics based VC entropy This paper treats case learning randomly labeled patterns capacity problem extends results previous work finite temperature As explained companion paper extension important treating generalization problem occurs context learning patterns labeled target rule Our general framework illustrated simple perceptron sgnw x maps N dimensional realvalued input x valued output Given sample X x x inputs weight vector w determines labeling L l l sample via l sgnw x The weight vector w defines normal hyperplane separates positive negative examples The training error labeling L respect reference labeling L defined X l l fraction different labels two labelings We consider case reference labeling chosen random address issue
In pathimitation task one agent traces path second agents sensory field The second agent reproduce path exactly ie move sequence locations visited first agent This nontrivial behaviour whose acquisition might expected involve specialpurpose ie strongly biased learning machinery However present paper shows case The behaviour acquired using fairly primitive learning regime provided agents environment made pass specific sequence dynamic states
This research aims demonstrate solution artificial ant problem likely nongeneral relying specific characteristics Santa Fe trail It presents consistent method promotes producing general solutions Using concepts training testing machine learning research method useful producing general behaviours simulation environments
Dynamic Bayesian networks DBNs useful tool representing complex stochastic processes Recent developments inference learning DBNs allow use realworld applications In paper apply DBNs problem speech recognition The factored state representation enabled DBNs allows us explicitly represent longterm articulatory acoustic context addition phoneticstate information maintained hidden Markov models HMMs Furthermore enables us model shortterm correlations among multiple observation streams within single timeframes Given DBN structure capable representing long shortterm correlations applied EM algorithm learn models parameters The use structured DBN models decreased error rate largevocabulary isolatedword recognition task compared discrete HMM also improved significantly published results task This first successful application DBNs largescale speech recognition problem Investigation learned models indicates hidden state variables strongly correlated acoustic properties speech signal
We describe evaluate multinetwork connectionist systems composed expert networks By preprocessing training data competitive learning network system automatically organizes process decomposition expert subtasks Using several different types challenging problem assess approach degree automatically generated experts really specialists predictable subset overall task comparison decompositions equivalent singlenetworks In addition assess utility approach alongside competition nonexpert multiversion systems Previously developed measures diversity systems also applied provide quantitative assessment degree specialization obtained expertnet ensemble We show types problem abstract welldefined datadefined automatic decomposition produce effective set specialist networks together support high level performance Curiously study provide support differential effectiveness within two classes problem continuous homogeneous functions discrete discontinuous functions
Let us present briefly learning problem address chapter following The ultimate goal modelling mapping f x multidimensional input x output The output multidimensional mostly address situations one dimensional real value Furthermore take account fact scarcely ever observe actual true mapping f x This due perturbations eg observational noise We rather joint probability p x We expect probability peaked values x corresponding mapping We focus automatic learning example A set D data sampled joint distribution p x p yjx p x collected With help set try identify model data parameterised set Learning optimisation The fit model system given point x measured using criterion representing distance model prediction b system e f w x This local risk The performance model measured expected This quantity represents ability yield good performance possible situations ie x pairs thus called generalisation error The optimal set parameters w f w x b
Practical optimization problems jobshop scheduling often involve optimization criteria change time Repairbased frameworks identified flexible computational paradigms difficult combinatorial optimization problems Since control problem repairbased optimization severe Reinforcement Learning RL techniques potentially helpful However fundamental assumptions made traditional RL algorithms valid repairbased optimization CaseBased Reasoning CBR compensates limitations traditional RL approaches In paper present CaseBased Reasoning RL approach implemented C A B I N S system repairbased optimization We chose jobshop scheduling testbed approach Our experimental results show C A B I N S able effectively solve problems changing optimization criteria known system exist implicitly extensional manner case base
We introduced earlier use regularisation learning procedure It understood regularisation often necessity increase quality results Even unregularised solution acceptable likely regularisation produce improvement performance There exist method giving directly best value regularisation parameter even linear case The topic chapter thus propose methods estimate best value The best one leads smallest generalisation error methods presented compared propose estimators generalisation error This estimation used approximate best regularisation level In sections present validationbased techniques They estimate generalisation error basis extra data In sections deal algebraic estimates error use extra data rely number assumptions The contribution chapter present techniques analyse ground We also present short derivations clarifying links different estimators generalisation error well comparison During course chapter error quadratic difference For validationbased methods possible consider kind error without modification method On hand algebraic estimates specific quadratic cost Adapting another cost function would require derive new expressions estimators
Evolutionary approaches advocated automate robot design Some research work shown success evolving controllers robots genetic approaches As observe however controller also robot body affect behavior robot robot system In paper develop hybrid GPGA approach evolve controllers robot bodies achieve behaviorspecified tasks In order assess performance developed approach used evolve simulated agent controller body obstacle avoidance simulated environment Experimental results show promise work In addition importance coevolving controllers robot bodies analyzed discussed paper
When delays set A ffi fj k j kg P jffi depends The estimated probabilities become quite noisy number elements set A B small For reason estimate standard deviation P jffi Notice estimate empirical average binomial variable either given couple satisfied conditions ffi The standard deviation estimated easily Generally speaking P jffi increases laxer output test ffi approaches stricter input condition Let us define P maximum ffi P jffi P max ffigt P jffi The dependability index defined P represents much data passes continuity test input information available This dependability index measures much remaining continuity information associated involving input This index averaged respect probability P P It clear therefore average positive quantities Furthermore system deterministic dependability zero certain number inputs sum averages saturates If system also noisefree sum For greater embedding dimension refers results obtained using method Statistical variable selection Statistical variable selection feature selection encompasses number techniques aimed choosing relevant subset input variables regression classification problem As rest document limit considerations related regression problem even though methods discussed apply classification well Variable selection seen part data analysis problem selection discard variable tells us relevance associated measurement modelled system In general setting purely combinatorial problem given V possible variables V possible subsets including empty set full set variables Given performance measure prediction error optimal scheme test subset choose one gives best performance It easy see extensive scheme viable number variables rather low Identifying V models variables requires much computation A number techniques devised overcome combinatorial limit Some use iterative locally optimal technique construct estimate relevant subset number steps We refer stepwise selection methods con fused stepwise regression subset methods address In forward selection start empty set variables At step select candidate variable using selection criteria check whether variable added set iterate given stop condition reached On contrary backward elimination methods start full set input variables At step least significant variable selected according selection criteria If variable irrelevant removed process iterated stop condition reached It easy devise examples inclusion variable causes previously included variable become irrelevant It thus seems appropriate consider running backward elimination time new variable added forward selection This combination ap proaches known stepwise regression linear regression con
We used genetic programming evolve b h topology sizing numerical values component analog electrical circuit correctly classify incoming analog electrical signal three categories Then r e p e r r e f u r c e w dynamically changed adding new source run The p p e r e c r b e h w h e
An essential component intelligent agent ability observe encode use information environment Traditional approaches Genetic Programming focused evolving functional reactive programs minimal use state This paper presents approach investigating evolution learning planning memory using Genetic Programming The approach uses multiphasic fitness environment enforces use memory allows fairly straightforward comprehension evolved representations An illustrative problem gold collection used demonstrate usefulness approach The results indicate approach evolve programs store simple representations environments use representations produce simple plans
Interest Genetic algorithms expanding rapidly This paper reviews software environments programming Genetic Algorithms GA As background initially preview genetic algorithms models programming Next classify GA software environments three main categories Applicationoriented Algorithmoriented ToolKits For category GA programming environment review common features present case study leading environment
Neural network pruning methods level individual network parameters eg connection weights improve generalization shown empirical study However open problem pruning methods known today eg OBD OBS autoprune epsiprune selection number parameters removed pruning step pruning strength This work presents pruning method lprune automatically adapts pruning strength evolution weights loss generalization training The method requires algorithm parameter adjustment user Results statistical significance tests comparing autoprune lprune static networks early stopping given based extensive experimentation different problems The results indicate training pruning often significantly better rarely significantly worse training early stopping without pruning Furthermore lprune often superior autoprune superior OBD diagnosis tasks unless severe pruning early training process required
The relative performance different methods classifier learning varies across domains Some recent Instance Based Learning IBL methods IBMVDM use similarity measures based conditional class probabilities These probabilities key component Naive Bayes methods Given commonality approach interest consider differences two methods linked relative performance different domains Here interpret Naive Bayes IBL like framework identifying differences Naive Bayes IBMVDM framework Experiments variants IBMVDM lie Naive Bayes framework conducted sixteen domains The results strongly suggest relative performance Naive Bayes IBMVDM linked extent class satisfactorily represented single instance IBL framework However factor appears significant
ASSERT demonstrates theory refinement techniques developed machine learning used ef fectively build student models intelligent tutoring systems This application unique since inverts normal goal theory refinement correcting errors knowledge base introducing A comprehensive experiment involving lar ge number students interacting automated tutor teaching concepts C programming used evaluate approach This experiment demonstrated ability theory refinement generate accurate student models raw induction well ability resulting models support individualized feedback actually improves students subsequent performance Carr B Goldstein I Overlays theory modeling computer aided instruction T echnical Report A I Memo Cambridge MA MIT Sandberg J Barnard Y Education technology What know And AI Artificial Intelligence Communications
The monitoring control dynamic system depends crucially ability reason current status future trajectory In case stochastic system tasks typically involve use belief statea probability distribution state process given point time Unfortunately state spaces complex processes large making explicit representation belief state intractable Even dynamic Bayesian networks DBNs process represented compactly representation belief state intractable We investigate idea utilizing compact approximation true belief state analyze conditions errors due approximations taken lifetime process accumulate make answers completely irrelevant We show error belief state contracts exponentially process evolves Thus even multiple approximations error process remains bounded indefinitely We show additional structure DBN used design approximation scheme improving performance significantly We demonstrate applicability ideas context monitoring task showing orders magnitude faster inference achieved small degradation accuracy
NonAxiomatic Reasoning System NARS designed generalpurpose intelligent reasoning system adaptive works insufficient knowledge resources This paper focuses components NARS contribute systems induction capacity shows traditional problems induction addressed system The NARS approach induction uses termoriented formal language experiencegrounded semantics consistently interprets various types uncertainty An induction rule generates conclusions common instance terms revision rule combines evidence different sources In NARS induction types inference deduction abduction based semantic foundation cooperate inference activities system The systems control mechanism makes knowledgedriven contextdependent inference possible
Although CaseBased Reasoning CBR natural formulation many problems previous work CBR applied design made apparent elements CBR paradigm prevented widely applied At time evaluating Constraint Satisfaction techniques design found commonality motivation repairbased constraint satisfaction problems CSP case adaptation This led us combine two methodologies order gain advantages CSP casebased reasoning allowing CBR widely flexibly applied In combining two methodologies found unexpected synergy commonality approaches This paper describes synergy commonality emerged combined casebased constraintbased reasoning gives brief overview continuing future work exploiting emergent synergy combining reasoning modes
In paper study dual version Ridge Regression procedure It allows us perform nonlinear regression constructing linear regression function high dimensional feature space The feature space representation result large increase number parameters used algorithm In order combat curse dimensionality algorithm allows use kernel functions used Support Vector methods We also discuss powerful family kernel functions constructed using ANOVA decomposition method kernel corresponding splines infinite number nodes This paper introduces regression estimation algorithm combination two elements dual version Ridge Regression applied ANOVA enhancement infinitenode splines Experimental results presented based Boston Housing data set indicate performance algorithm relative algorithms
A twoeye visual environment used training network BCM neurons We study effect misalignment synaptic density functions two eyes formation orientation selectivity ocular dominance lateral inhibition network The visual environment use composed natural images We show BCM rule natural image environment binocular cortical misalignment sufficient producing networks orientation selective cells ocular dominance columns This work extension previous single cell misalignment model Shouval et al
The von Mises distribution maximum entropy distribution It corresponds distribution angle compass needle uniform magnetic field direction concentration parameter The concentration parameter ratio field strength temperature thermal fluctuations Previously obtained Bayesian estimator von Mises distribution parameters using informationtheoretic Minimum Message Length MML principle Here examine variety Bayesian estimation techniques examining posterior distribution polar Cartesian coordinates We compare MML estimator fellow Bayesian techniques range Classical estimators We find Bayesian estimators outperform Classical estimators
Neuralnetwork ensembles shown accurate classification techniques Previous work shown effective ensemble consist networks highly correct ones make errors different parts input space well Most existing techniques however indirectly address problem creating set networks In paper present technique called Addemup uses genetic algorithms directly search accurate diverse set trained networks Addemup works first creating initial population uses genetic operators continually create new networks keeping set networks accurate possible disagreeing much possible Experiments three DNA problems show Addemup able generate set trained networks accurate several existing approaches Experiments also show Addemup able effectively incorporate prior knowledge available improve quality ensemble
Classifier induction algorithms differ inductive hypotheses represent search space hypotheses No classifier better another problems selective superiority This paper empirically compares six classifier induction algorithms diagnosis equine colic prediction mortality The classification based simultaneously analyzing sixteen features measured patient The relative merits algorithms linear regression decision trees nearest neighbor classifiers Model Class Selection system logistic regression without feature selection neural nets qualitatively discussed generalization accuracies quantitatively analyzed
Using multiparent diagonal scanning crossover GAs reproduction operators obtain adjustable arity Hereby sexuality becomes graded feature instead Boolean one Our main objective relate performance GAs extent sexuality used reproduction less arbitrary functions reported current literature We investigate GA behaviour Kauffmans NKlandscapes allow systematic characterization user control ruggedness fitness landscape We test GAs varying extent sexuality ranging asexual sexual Our tests performed two types NKlandscapes landscapes random landscapes nearest neighbour epistasis For landscape types selected landscapes range ruggednesses The results confirm superiority sexual recombination mildly epistatic problems
This paper discusses unsupervised learning problem An important part unsupervised learning problem determining number constituent groups components classes best describes data We apply Minimum Message Length MML criterion unsupervised learning problem modifying earlier MML application We give empirical comparison criteria prominent literature estimating number components data set We conclude Minimum Message Length criterion performs better alternatives data considered unsupervised learning tasks
The Minimum Message Length MML technique applied problem estimating parameters multivariate Gaussian model correlation structure modelled single common factor Implicit estimator equations derived compared obtained Maximum Likelihood ML analysis Unlike ML MML estimators remain consistent used estimate factor loadings factor scores Tests simulated data show MML estimates av erage accurate ML estimates former exist If data show little evidence factor MML estimate collapses It shown condition existence MML estimate essentially log likelihood ratio favour factor model exceed value expected null nofactor hypotheses
This paper firstly provides reappraisal development techniques inverting deduction secondly introduces ModeDirected Inverse Entailment MDIE generalisation enhancement previous approaches thirdly describes implementation MDIE Progol system Progol implemented C available anonymous ftp The reassessment previous techniques terms inverse entailment leads new results learning positive data inverting implication pairs clauses
Firstorder learning involves finding clauseform definition relation examples relation relevant background information In paper particular firstorder learning system modified customize finding definitions functional relations This restriction leads faster learning times cases definitions higher predictive accuracy Other firstorder learning systems might benefit similar specialization
Technical Report D epartement dInformatique et Recherche Op erationnelle Universit e de Montr eal Abstract Boosting general method improving performance learning algorithm consistently generates classifiers need perform slightly better random guessing A recently proposed promising boosting algorithm AdaBoost It applied great success several benchmark machine learning problems using rather simple learning algorithms particular decision trees In paper use AdaBoost improve performances neural networks applied character recognition tasks We compare training methods based sampling training set weighting cost function Our system achieves error data base online handwritten digits writers Adaptive boosting multilayer network achieved error UCI Letters offline characters data set
The ability inductive learning system find good solution given problem dependent upon representation used features problem Systems perform constructive induction able change representation constructing new features We describe important realworld problem finding genes DNA believe offers interesting challenge constructiveinduction researchers We report experiments demonstrate two different input representations task result significantly different generalization performance neural networks decision trees neural symbolic methods constructive induction fail bridge gap two representations We believe realworld domain provides interesting challenge problem constructive induction relationship two representations well known representational shift involved construct ing better representation imposing
DIMACS Technical Report July A preliminary version paper appeared proceedings EuroCOLT conference published volume Lecture Notes Artificial Intelligence pages SpringerVerlag The journal version appear Algoritmica Email beimeldimacsrutgersedu httpdimacsrutgersedubeimel Part research done author PhD student Technion Email eyalkcstechnionacil httpwwwcstechnionacileyalk This research supported Technion VPR Fund Japan Technion Society Research Fund
This paper demonstrates capabilities Foidl inductive logic programming ILP system whose distinguishing characteristics ability produce firstorder decision lists use output completeness assumption substitute negative examples use intensional background knowledge The development Foidl originally motivated problem learning generate past tense English verbs however paper demonstrates superior performance two different sets benchmark ILP problems Tests finite element mesh design problem show Foidls decision lists enable produce generally accurate results range methods previously applied problem Tests selection listprocessing problems Bratkos introductory Prolog text demonstrate combination implicit negatives intensionality allow Foidl learn correct programs far fewer examples Foil
Further Results Controllability Abstract This paper studies controllability properties recurrent neural networks The new contributions extension previous result slightly different model formulation proof necessary sufficient condition analysis lowdimensional case Recurrent Neural Networks fl
c flMIT Media Lab Perceptual Computing Learning Common Sense Technical Report nov Abstract We present algorithms coupling training hidden Markov models HMMs model interacting processes demonstrate superiority conventional HMMs vision task classifying twohanded actions HMMs perhaps successful framework perceptual computing modeling classifying dynamic behaviors popular offer dynamic time warping training algorithm clear Bayesian semantics However Markovian framework makes strong restrictive assumptions system generating signalthat single process small number states extremely limited state memory The singleprocess model often inappropriate vision speech applications resulting low ceilings model performance Coupled HMMs provide efficient way resolve many problems offer superior training speeds model likelihoods robustness initial conditions
The general framework reinforcement learning proposed several researchers solution optimization problems realization adaptive control schemes To allow efficient application reinforcement learning either areas necessary solve structural temporal credit assignment problem In paper concentrate latter usually tackled use learning algorithms employ discounted rewards We argue realistic problems kind solution satisfactory since address effect noise originating different experiences allow easy explanation parameters involved learning process As possible solution propose keep delayed reward undiscounted discount actual adaptation rate Empirical results show dependent kind discount used amore stable convergence even increase performance obtained
For certain classes problems defined twodimensional domains grid structure optimization problems involving assignment grid cells processors present nonlinear network model problem partitioning tasks among processors minimize interprocessor communication Minimizing interprocessor communication context shown equivalent tiling domain minimize total tile perimeter tile corresponds collection tasks assigned processor A tight lower bound perimeter tile function area developed We show generate minimumperimeter tiles By using assignments corresponding nearrectangular minimumperimeter tiles closed form solutions developed certain classes domains We conclude computational results parallel highlevel genetic algorithms produced good sometimes provably optimal solutions large perimeter minimization problems
We report successful application TD value function approximation task jobshop scheduling Our scheduling problems based problem scheduling payload processing steps NASA space shuttle program The value function approximated layer feedforward network sigmoid units A onestep lookahead greedy algorithm using learned evaluation function outperforms best existing algorithm task iterative repair method incorporating simulated annealing To understand reasons performance improvement paper introduces several measurements learning process discusses several hypotheses suggested measurements We conclude use value function approximation source difficulty method fact may explain success method independent use value iteration Additional experiments required discriminate among hypotheses
Several metrics used empirical studies explore mechanisms convergence genetic algorithms The metric designed measure consistency arbitrary ranking hyperplanes partition respect target string Walsh coefficients calculated small functions order characterize sources linear nonlinear interactions A simple deception measure also developed look closely effects increasing nonlinearity functions Correlations metric deception measure discussed relationships convergence behavior simple genetic algorithm studied large sets functions varying degrees nonlinearity
Exercises problems ordered increasing order difficulty Teaching problemsolving exercises widely used pedagogic technique A computational reason knowledge gained solving simple problems useful efficiently solving difficult problems We adopt approach learning exercises acquire searchcontrol knowledge form goaldecomposition rules drules Drules first order learned using new generalizeandtest algorithm based inductive logic programming techniques We demonstrate feasibility approach applying two planning mains
Foveal vision features imagers graded acuity coupled context sensitive sensor gaze control analogous prevalent throughout vertebrate vision Foveal vision operates efficiently uniform acuity vision resolution treated dynamically allocatable resource requires refined visual attention mechanism We demonstrate reinforcement learning RL significantly improves performance foveal visual attention overall vision system task model based target recognition A simulated foveal vision system shown classify targets fewer fixations learning strategies acquisition visual information relevant task learning generalize strategies ambiguous unexpected scenario conditions
A Horn definition set Horn clauses head literal In paper consider learning nonrecursive functionfree firstorder Horn definitions We show class exactly learnable equivalence membership queries It follows class PAC learnable using examples membership queries Our results shown applicable learning efficient goaldecomposition rules planning domains
Speedup learning study improving problemsolving performance experience outside guidance We describe system successfully combines best features Explanationbased learning empirical learning learn goal decomposition rules examples successful problem solving membership queries We demonstrate system efficiently learn effective decomposition rules three different domains Our results suggest theoryguided empirical learning overcome problems purely explanationbased learning purely empirical learning effective speedup learning method
The authors coworkers recently proved general theorems global stabilization linear systems subject control saturation This paper develops detail explicit design linearized equations longitudinal flight control F aircraft tests obtained controller original nonlinear model This paper represents first detailed derivation controller using techniques question results encouraging
When casebased planner retrieving previous case preparation solving new similar problem often aware implicit features new problem situation determine particular case may successfully applied This means cases may fail improve planners performance By detecting explaining case failures occur retrieval may improved incrementally In paper provide definition case failure casebased planner dersnlp derivation replay snlp solves new problems replaying previous plan derivations We provide explanationbased learning EBL techniques detecting constructing reasons case failure We also describe case library organized incorporate failure information produced Finally present empirical study demonstrates effectiveness approach improving performance dersnlp
We describe analyze mixture model supervised learning probabilistic transducers We devise online learning algorithm efficiently infers structure estimates parameters probabilistic transducer mixture Theoretical analysis comparative simulations indicate learning algorithm tracks best transducer arbitrarily large possibly infinite pool models We also present application model inducing noun phrase recognizer
Differentiation nodes competitive learning network conventionally achieved competition basis neural activity Simple inhibitory mechanisms limited sparse representations decorrelation factorization schemes support distributed representations computationally unattractive By letting neural plasticity mediate competitive interaction instead obtain diffuse nonadaptive alternatives fully distributed representations We use technique simplify improve binary information gain optimization algorithm feature extraction Schraudolph Sejnowski approach could used improve learning algorithms
It shown Bayesian training backpropagation neural networks feasibly performed Hybrid Monte Carlo method This approach allows true predictive distribution test case given set training cases approximated arbitrarily closely contrast previous approaches approximate posterior weight distribution Gaussian In work Hybrid Monte Carlo method implemented conjunction simulated annealing order speed relaxation good region parameter space The method applied test problem demonstrating produce good predictions well indication uncertainty predictions Appropriate weight scaling factors found automatically By applying known techniques calculation free energy differences also possible compare merits different network architectures The work described also applicable wide variety statistical models neural networks
Former layouts contain much knowhow architects A generic automatic way formalize knowhow order use computer would save lot effort money However seems way The access knowhow layouts Developing generic software tool reuse former layouts consider every part architectual domain things like personal style Tools used today consider small parts architectual domain Any personal style ignored Isnt possible build basic tool adjusted content former layouts may extended incremently modeling much domain desirable This paper describe reuse tool perform task focusing topological geometrical binary relations
An important difficult prediction task many domains particularly medical decision making prognosis Prognosis presents unique set problems learning system outputs unknown This paper presents new approach prognostic prediction using ideas nonparametric statistics fully utilize available information neural architecture The technique applied breast cancer prognosis resulting flexible accurate models may play role prevent ing unnecessary surgeries
In paper new approach presented transfers basic idea Evolution Strategies ESs GAs Mutation rates changed endogeneous items adapting search process First experimental results presented indicate environmentdependent selfadaptation appropriate settings mutation rate possible even GAs
Previous teaching models learning theory community batch models That models teacher generated single set helpful examples present learner In paper present interactive model learner ability ask queries query learning model Angluin We show model least powerful previous teaching models We also show anything learnable queries even randomized learner teachable model In previous teaching models classes shown teachable known efficiently learnable An important concept class known learnable DNF formulas We demonstrate power approach providing deterministic teacher learner class DNF formulas The learner makes equivalence queries hypotheses also DNF formulas
A neuralnetwork ensemble successful technique outputs set separately trained neural network combined form one unified prediction An effective ensemble consist set networks highly correct ones make errors different parts input space well however existing techniques indirectly address problem creating set We present algorithm called Addemup uses genetic algorithms explicitly search highly diverse set accurate trained networks Addemup works first creating initial population uses genetic operators continually create new networks keeping set networks highly accurate disagreeing much possible Experiments four realworld domains show Addemup able generate set trained networks accurate several existing ensemble approaches Experiments also show Addemup able effectively incorporate prior knowledge available improve quality ensemble
Default logic encounters conceptual difficulties representing common sense reasoning tasks We argue try formulate modular default rules presumed work circumstances We need take account importance context continuously evolving reasoning process Sequential thresholding quantitative counterpart default logic makes explicit role context plays construction nonmonotonic extension We present semantic characterization generic nonmonotonic reasoning well instantiations pertaining default logic sequential thresholding This provides link two mechanisms well way integrate two beneficial
The problem maximizing expected total discounted reward completely observable Markovian environment ie Markov decision process mdp models particular class sequential decision problems Algorithms developed making optimal decisions mdps given either mdp specification opportunity interact mdp time Recently sequential decisionmaking problems studied prompting development new algorithms analyses We describe new generalized model subsumes mdps well many recent variations We prove basic results concerning model develop generalizations value iteration policy iteration modelbased reinforcementlearning Qlearning used make optimal decisions generalized model various assumptions Applications theory particular models described including riskaverse mdps explorationsensitive mdps sarsa Qlearning spreading twoplayer games approximate max picking via sampling Central results contraction property value operator stochasticapproximation theorem reduces asynchronous convergence synchronous convergence
Incorporating declarative bias prior knowledge learning active research topic machine learning Treestructured bias specifies prior knowledge tree relevance relationships attributes This paper presents learning algorithm implements treestructured bias ie learns target function probably approximately correctly random examples membership queries obeys given treestructured bias The theoretical predictions paper em pirically validated
We introduce large family Boltzmann machines trained using standard gradient descent The networks one layers hidden units treelike connectivity We show implement supervised learning algorithm Boltzmann machines exactly without resort simulated meanfield annealing The stochastic averages yield gradients weight space computed technique decimation We present results problems N bit parity detection hidden symmetries
The data describing resolutions telephone network local loop troubles wish learn rules dispatching technicians notoriously unreliable Anecdotes abound detailing reasons resolution entered technician would valid ranging sympathy fear ignorance negligence management pressure In paper describe four different approaches dealing problem bad data order first determine whether machine learning promise domain determine well machine learning might perform We offer evidence machine learning help build dispatching method perform better system currently place
We study notions bias variance classification rules Following Efron develop decomposition prediction error natural components Then derive bootstrap estimates components illustrate used describe error behaviour classifier practice In process also obtain bootstrap estimate error bagged classifier
This paper deals systems obtained linear timeinvariant continuousor discretetime devices followed function provides sign output Such systems appear naturally study quantized observations well signal processing neural network theory Results given observability minimal realizations systemtheoretic concepts Certain major differences exist linear case results generalize surprisingly straightforward manner
CaseBased Reasoning CBR paradigm close designer behavior conceptual design seems fruitable computer aideddesign approach library design cases available The goal paper presents general framework casebased retrieval system REPRO supports chemical process design The crucial problems like case representation structural similarity measure widely described The presented experimental results expert evaluation shows usefulness described system real world problems The papers ends discussion concerning research problems future work
Hollands analysis sources power genetic algorithms served guidance applications genetic algorithms years The technique applying recombination operator crossover population individuals key power Neverless number contradictory results concerning crossover operators respect overall performance Recently example genetic algorithms used design neural network modules control circuits In studies genetic algorithm without crossover outperformed genetic algorithm crossover This report reexamines studies concludes results caused small population size New results presented illustrate effectiveness crossover population size larger From performance view results indicate better neural networks evolved shorter time genetic algorithm uses crossover
In paper explore use genetic algorithms GAs construct system called GABIL continually learns refines concept classification rules interac tion environment The performance system compared two concept learners NEWGEM C suite target concepts From comparison identify strategies responsible success concept learners We implement subset strategies within GABIL produce multistrategy concept learner Finally multistrategy concept learner enhanced allowing GAs adaptively select appropriate strategies
Suppose learning task select one hypothesis set hypotheses may example generated multiple applications randomized learning algorithm A common approach evaluate hypothesis set previously unseen crossvalidation data select hypothesis lowest crossvalidation error But crossvalidation data partially corrupted noise set hypotheses selecting large folklore also warns overfitting crossvalidation data Klockars Sax Tukey Tukey In paper explain overfitting really occurs show surprising result overcome selecting hypothesis higher crossvalidation error others lower crossvalidation errors We give reasons selecting hypothesis lowest crossvalidation error propose new algorithm LOOCVCV uses computationally efficient form leaveoneout crossvalidation select hypothesis Finally present experimental results one domain show LOOCVCV consistently beating picking hypothesis lowest crossvalidation error even using reasonably large crossvalidation sets
We investigate learning membership equivalence queries assuming information provided learner incomplete By incomplete mean membership queries may answered I dont know This model worstcase version incomplete membership query model Angluin Slonim It attempts model practical learning situations including experiment Lang Baum describe teacher may unable answer reliably queries critical learning algorithm We present algorithms learn monotone kterm DNF membership queries learn monotone DNF membership equivalence queries Compared complete information case query complexity increases additive term linear number I dont know answers received We also observe blowup number queries general exponential new model incomplete membership model
We show wellknown Lyapunov sufficient condition inputtostate stability also necessary settling positively open question raised several authors past years Additional characterizations ISS property including one terms nonlinear stability margins also provided
Determination consistent initial conditions important aspect solution differential algebraic equations DAEs Specification inconsistent initial conditions even slightly inconsistent often leads failure initialization problem In paper present Successive Linear Programming SLP approach solution DAE derivative array equations initialization problem The SLP formulation handles roundoff errors inconsistent user specifications among others allows reliable convergence strategies incorporate variable bounds trust region concepts A new consistent set initial conditions obtained minimizing deviation variable values specified ones For problems discontinuities caused step change input functions new criterion presented identifying subset variables continuous across discontinuity The LP formulation applied determine consistent set initial conditions solution problem domain discontinuity Numerous example problems solved illustrate concepts
In field optimization machine learning techniques efficient promising tools like Genetic Algorithms GAs HillClimbing designed In field Evolving NonDeterminism END model proposes inventive way explore space states combined use simulated coevolution remedies drawbacks previous techniques even allow model outperform difficult problems This new model applied sorting network problem reference problem challenged many computer scientists original oneplayer game named Solitaire For first problem END model able build scratch sorting networks good best known input problem It even improved one comparator years old result input problem For Solitaire game END evolved strategy comparable human designed strategy
In field optimization machine learning techniques efficient promising tools like Genetic Algorithms GAs HillClimbing designed In field Evolving NonDeterminism END model presented paper proposes inventive way explore space states using simulated incremental coevolution organisms remedies drawbacks previous techniques even allow model outperform difficult problems This new model applied sorting network problem reference problem challenged many computer scientists original oneplayer game named Solitaire For first problem END model able build scratch sorting networks good best known input problem It even improved one comparator years old result input problem For Solitaire game END evolved strategy comparable human designed strategy
Most casebased reasoning systems used single best similar case basis solution For many problems however single exact solution Rather range acceptable answers We use cases basis solution also indicate boundaries within solution found We solve problems choosing point within boundaries In paper I discuss use cases illustrations chiron system I implemented domain personal income tax planning
We used genetic programming develop efficient image processing software The ultimate goal work detect certain signs breast cancer detected current segmentation classification methods Traditional techniques relatively good job segmenting classifying smallscale features mammograms microcalcification clusters Our stronglytyped genetic programs work multiresolution representation mammogram aimed handling features medium large scales stellated lesions architectural distortions The main problem efficiency We employ program optimizations speed evolution process factor ten In paper present genetic programming system describe optimization techniques
We propose two algorithms constructing training compact feedforward networks linear threshold units The Shift procedure constructs networks single hidden layer PTI constructs multilayered networks The resulting networks guaranteed perform given task binary realvalued inputs The various experimental results reported tasks binary real inputs indicate methods compare favorably alternative procedures deriving similar strategies terms size resulting networks generalization properties
The naive Bayesian classifier simple effective attribute independence assumption often violated real world A number approaches developed seek alleviate problem A Bayesian tree learning algorithm builds decision tree generates local Bayesian classifier leaf instead predicting single class However Bayesian tree learning still suffers replication fragmentation problems tree learning While inferred Bayesian trees demonstrate low average prediction error rates reason believe error rates higher leaves training examples This paper proposes novel lazy Bayesian tree learning algorithm For test example conceptually builds appropriate Bayesian tree In practice one path local Bayesian classifier leaf created Experiments wide variety realworld artificial domains show new algorithm significantly lower overall prediction error rates naive Bayesian classifier C Bayesian tree learning algorithm
The problem learning Bayesian networks hidden variables known hard problem Even simpler task learning conditional probabilities Bayesian network hidden variables hard In paper present approach learns conditional probabilities Bayesian network hidden variables transforming multilayer feedforward neural network ANN The conditional probabilities mapped onto weights ANN learned using standard backpropagation techniques To avoid problem exponentially large ANNs focus Bayesian networks noisyor noisyand nodes Experiments real world classification problems demonstrate effectiveness technique
Computational models natural systems often contain free parameters must set optimize predictive accuracy models This processcalled calibrationcan viewed form supervised learning presence prior knowledge In view fixed aspects model constitute prior knowledge goal learn correct values free parameters We report series attempts learn parameter values global vegetation model called MAPSS Mapped AtmospherePlantSoil System developed collaborator Ron Neilson Unfortunately attempts apply standard machine learning methodsspecifically global error functions gradient descent searchdo work MAPSS constraints introduced structure model prior knowledge create difficult nonlinear optimization problem Successful calibration MAPSS required taking divideandconquer approach subsets parameters calibrated others held constant This approach made possible carefully selecting training sets exercised portions model designing error functions part desirable properties The automated calibration tool developed currently applied calibrate MAPSS global climate data set
Evolutionary learning methods found useful several areas development intelligent robots In approach described evolutionary algorithms used explore alternative robot behaviors within simulation model way reducing overall knowledge engineering effort This paper presents initial results applying SAMUEL genetic learning system collision avoidance navigation task mobile robots
The bias variance real valued random variable using squared error loss well understood However recent developments classification techniques become desirable extend concepts general random variables loss functions The misclassification loss function categorical random variables particular interest We explore concepts variance bias develop decomposition prediction error functions systematic variable parts predictor After providing examples conclude discussion various definitions proposed
Retrieving relevant cases crucial component casebased reasoning systems The task use userdefined query retrieve useful information ie exact matches partial matches close querydefined request according certain measures The difficulty stems fact may easy may even impossible specify query requests precisely completely resulting situation known fuzzyquerying It usually problem small domains large repositories store various information multifunctional information bases federated databases request specification becomes bottleneck Thus flexible retrieval algorithm required allowing imprecise query specification changing viewpoint Efficient database techniques exists locating exact matches Finding relevant partial matches might problem This document proposes contextbased similarity basis flexible retrieval Historical bacground research similarity assessment presented used motivation formal definition contextbased similarity We also describe similaritybased retrieval system multifunctinal information bases
In earlier paper introduced new boosting algorithm called AdaBoost theoretically used significantly reduce error learning algorithm consistently generates classifiers whose performance little better random guessing We also introduced related notion pseudoloss method forcing learning algorithm multilabel concepts concentrate labels hardest discriminate In paper describe experiments carried assess well AdaBoost without pseudoloss performs real learning problems We performed two sets experiments The first set compared boosting Breimans bagging method used aggregate various classifiers including decision trees single attributevalue tests We compared performance two methods collection machinelearning benchmarks In second set experiments studied detail performance boosting using nearestneighbor classifier OCR problem
Quantization parameters Perceptron central problem hardware implementation neural networks using numerical technology An interesting property neural networks used classifiers ability provide robustness input noise This paper presents efficient learning algorithms maximization robustness Perceptron especially designed tackle combinatorial problem arising discrete weights
Machine learning techniques used extract knowledge data stored medical databases In application various machine learning algorithms used extract diagnostic knowledge support diagnosis sport injuries The applied methods include variants Assistant algorithm topdown induction decision trees variants Bayesian classifier The available dataset insufficent reliable diagnosis sport injuries considered system Consequently expertdefined diagnostic rules added used preclassifiers generators additional training instances injuries training examples Experimental results show classification accuracy explanation capability naive Bayesian classifier fuzzy discretization numerical attributes superior methods estimated appro priate practical use
This paper introduces new machine learning taskmodel calibrationand presents method solving particularly difficult model calibration task arose part global climate change research project The model calibration task problem training free parameters scientific model order optimize accuracy model making future predictions It form supervised learning examples presence prior knowledge An obvious approach solving calibration problems formulate global optimization problems goal find values free parameters minimize error model training data Unfortunately global optimization approach becomes computationally infeasible model highly nonlinear This paper presents new divideandconquer method analyzes model identify series smaller optimization problems whose sequential solution solves global calibration problem This paper argues methods kindrather global optimization techniqueswill required order agents large amounts prior knowledge learn efficiently
We describe principles functionalities Dlab Declarative LAnguage Bias Dlab used inductive learning systems define syntactically traverse efficiently finite subspaces first order clausal logic set propositional formulae association rules Horn clauses full clauses A Prolog implementation Dlab available ftp access Keywords declarative language bias concept learning knowledge dis covery
This paper compares representational capabilities one hidden layer two hidden layer nets consisting feedforward interconnections linear threshold units It remarked certain problems two hidden layers required contrary might principle expected known approximation theorems The differences based numerical accuracy number units needed capabilities feature extraction rather much basic classification direct inverse problems The former correspond approximation continuous functions latter concerned approximating onesided inverses continuous functions often encountered context inverse kinematics determination control questions A general result given showing nonlinear control systems stabilized using two hidden layers general using one
Discovery involves collaboration among many intelligent activities However little known form collaboration occurs In paper framework proposed autonomous systems learn discover environment Within framework many intelligent activities perception action exploration experimentation learning problem solving new term construction integrated coherent way The framework presented detail implemented system called LIVE evaluated performance LIVE several discovery tasks The conclusion autonomous learning environment feasible approach integrating activities involved discovery process
Support Vector Machines used time series prediction compared radial basis function networks We make use two different cost functions Support Vectors training insensitive loss ii Hubers robust loss function discuss choose regularization parameters models Two applications considered data noisy normal uniform noise Mackey Glass equation b Santa Fe competition set D In cases Support Vector Machines show excellent performance In case b Support Vector approach improves best known result benchmark factor
In paper evaluate classification accuracy four statistical three neural network classifiers two image based pattern classification problems These fingerprint classification optical character recognition OCR isolated handprinted digits The evaluation results reported useful designers practical systems two important commercial applications For OCR problem KarhunenLoeve KL transform images used generate input feature set Similarly fingerprint problem KL transform ridge directions used generate input feature set The statistical classifiers used Euclidean minimum distance quadratic minimum distance normal knearest neighbor The neural network classifiers used multilayer perceptron radial basis function probabilistic The OCR data consisted digit images training digit images testing The fingerprint data consisted training testing images In addition evaluation accuracy multilayer perceptron radial basis function networks evaluated size generalization capability For evaluated datasets best accuracy obtained either problem provided probabilistic neural network minimum classification error OCR fingerprints
The problem trajectory tracking presence input constraints considered The desired trajectory reparameterized slower time scale order avoid input saturation Necessary conditions reparameterizing function must satisfy derived The deviation nominal trajectory minimized formulating problem optimal control problem
Genetic programming applied task finding cliques graph Nodes graph represented tree structures manipulated form candidate cliques The intrinsic properties clique detection complicates design good fitness evaluation We analyze properties show clique detector found better finding maximum clique graph set cliques
Computer models casebased reasoning CBR generally guide case adaptation using fixed set adaptation rules A difficult practical problem identify knowledge required guide adaptation particular tasks Likewise open issue CBR cognitive model case adaptation knowledge learned We describe new approach acquiring case adaptation knowledge In approach adaptation problems initially solved reasoning scratch using abstract rules structural transformations general memory search heuristics Traces processing used successful rulebased adaptation stored cases enable future adaptation done casebased reasoning When similar adaptation problems encountered future adaptation cases provide task domainspecific guidance case adaptation process We present tenets approach concerning relationship memory search case adaptation memory search process storage reuse cases representing adaptation episodes These points discussed context ongoing research DIAL computer model learns case adaptation knowledge casebased disaster response planning
The development multistrategy learning systems based clear understanding roles applicability conditions different learning strategies To end chapter introduces Inferential Theory Learning provides conceptual framework explaining logical capabilities learning strategies ie competence Viewing learning process modifying learners knowledge exploring learners experience theory postulates process described search knowledge space triggered learners experience guided learning goals The search operators instantiations knowledge transmutations generic patterns knowledge change Transmutations may employ basic type inferencededuction induction analogy Several fundamental knowledge transmutations described novel general way generalization abstraction explanation similization counterparts specialization concretion prediction dissimilization respectively Generalization enlarges reference set description set entities described Abstraction reduces amount detail reference set Explanation generates premises explain imply given properties reference set Similization transfers knowledge one reference set similar reference set Using concepts theory multistrategy taskadaptive learning MTL methodology outlined illustrated b example MTL dynamically adapts strategies learning task defined input information learners background knowledge learning goal It aims synergistically integrating whole range inferential learning strategies empirical generalization constructive induction deductive generalization explanation prediction abstraction similization
The Support Vector SV machine novel type learning machine based statistical learning theory contains polynomial classifiers neural networks radial basis function RBF networks special cases In RBF case SV algorithm automatically determines centers weights threshold minimize upper bound expected test error The present study devoted experimental comparison machines classical approach centers determined kmeans clustering weights found using error backpropagation We consider three machines namely classical RBF machine SV machine Gaussian kernel hybrid system centers determined SV method weights trained error backpropagation Our results show US postal service database handwritten digits SV machine achieves highest test accuracy followed hybrid approach The SV approach thus theoretically wellfounded also superior practical application This report describes research done Center Biological Computational Learning Artificial Intelligence Laboratory Massachusetts Institute Technology ATT Bell Laboratories ATT Research Lucent Technologies Bell Laboratories Support Center provided part grant National Science Foundation contract ASC BS thanks MIT hospitality threeweek visit March work started At time study BS CB VV ATT Bell Laboratories NJ KS FG PN TP Massachusetts Institute Technology KS Department Information Systems Computer Science National University Singapore Lower Kent Ridge Road Singapore CB PN Lucent Technologies Bell Laboratories NJ VV ATT Research NJ BS supported Studienstiftung des deutschen Volkes CB supported ARPA ONR contract number NC We thank A Smola useful discussions Please direct correspondence Bernhard Scholkopf bsmpiktuebmpgde MaxPlanckInstitut fur biologische Kybernetik Spemannstr Tubingen Germany
Ensembles classifiers eg decision trees often exhibit greater predictive accuracy single classifiers alone Bagging boosting two standard ways generating combining multiple classifiers Unfortunately increase predictive performance usually linked dramatic decrease intelligibility ensembles less black boxes comparable neural networks So far attempts pruning ensembles successful approximately reducing ensembles half This paper describes different approach tries keep ensemblesizes small induction already also limits complexity single classifiers rigorously Single classifiers decisionstumps prespecified maximal depth They combined majority voting Ensembles induced pruned simple hillclimbing procedure These ensembles reasonably transformed equivalent decision trees We conduct empirical evaluation investigate predictive accuracies classifier complexities
Just input state stability iss generalizes idea finite gains respect supremum norms new notion integral input state stability iiss generalizes concept finite gain using integral norm inputs In paper obtain necessary sufficient characterization iiss property expressed terms dissipation inequalities
The use graphs represent independence structure multivariate probability models pursued relatively independent fashion across wide variety research disciplines since beginning century This paper provides brief overview current status research particular attention recent developments served unify seemingly disparate topics probabilistic expert systems statistical physics image analysis genetics decoding errorcorrecting codes Kalman filters speech recognition Markov models
In probabilitybased reasoning system Bayes theorem variations often used revise systems beliefs However explicit conditions implicit conditions probability assignments properly distinguished follows Bayes theorem generally applicable revision rule Upon properly distinguishing belief revision belief updating see Jeffreys rule variations revision rules either Without distinctions limitation Bayesian approach often ignored underestimated Revision general form done Bayesian approach probability distribution function alone contain information needed operation
At beginning paper three binary term logics defined The first based inheritance relation The second third suggest novel way process extension intension also interesting relations Aristotles syllogistic logic Based three simple systems NonAxiomatic Logic defined It termoriented language experiencegrounded semantics It uniformly represents processes randomness fuzziness ignorance It also uniformly carries deduction abduction induction revision
An inference graph many derivation strategies particular ordering steps involved reducing given query sequence database retrievals An optimal strategy given distribution queries complete strategy whose expected cost minimal expected cost depends conditional probabilities requested retrieval succeeds given member class queries posed This paper describes PAO algorithm first uses set training examples approximate probability values uses estimates produce probably approximately optimal strategy ie given ffi gt PAO produces strategy whose cost within cost optimal strategy probability greater ffi This paper also shows obtain strategies time polynomial ffi size inference graph many important classes graphs including andor trees
NARS uses new form term logic extended syllogism several types uncertainties represented processed deduction induction abduction revision carried unified format The system works asynchronously parallel way The memory system dynamically organized also interpreted network
Uncertainty artificial intelligence active research field several approaches suggested studied dealing various types uncertainty However hard rank approaches general usually aimed special application environment This paper begins defining environment show existing approaches used situation Then new approach NonAxiomatic Reasoning System introduced work environment The system designed assumption systems knowledge resources usually insufficient handle tasks imposed environment The system consistently represent several types uncertainty carry multiple operations uncertainties Finally new approach compared previous approaches terms uncertainty representation interpretation
Many realworld time series multistationary underlying data generating process DGP switches different stationary subprocesses modes operation An important problem modeling systems discover underlying switching process entails identifying number subprocesses dynamics subprocess For many time series problem illdefined since often obvious means distinguish different subprocesses We discuss use nonlinear gated experts perform segmentation system identification time series Unlike standard gated experts methods however use concepts statistical physics enhance segmentation highnoise problems experts required
Systems learn examples often create disjunctive concept definition The disjuncts concept definition cover training examples referred small disjuncts The problem small disjuncts error prone large disjuncts may necessary achieve high level predictive accuracy Holte Acker Porter This paper extends previous work done problem small disjuncts taking noise account It investigates assertion hard learn noisy data difficult distinguish noise true exceptions In process evaluating assertion insights gained mechanisms noise affects learning Two domains investigated The experimental results paper suggest Shapiros chess endgame domain Shapiro Wisconsin breast cancer domain Wolberg assertion true least low levels class noise
A training set data used construct rule predicting future responses What error rate rule The traditional answer question given crossvalidation The crossvalidation estimate prediction error nearly unbiased highly variable This article discusses bootstrap estimates prediction error thought smoothed versions crossvalidation A particular bootstrap method rule shown substantially outperform crossvalidation catalog simulation experiments Besides providing point estimates also consider estimating variability error rate estimate All results nonparametric apply possible prediction rule however study classification problems loss detail Our simulations include smooth prediction rules like Fishers Linear Discriminant Function unsmooth ones like Nearest Neighbors
In paper discuss application MemoryBased Learning MBL fast NP chunking We first discuss application fast decision tree variant MBL IGTree dataset described Ramshaw Marcus consists roughly test train items In second series experiments used architecture two cascaded IGTrees In second level cascaded classifier added context predictions extra features incorrect predictions first level corrected yielding generalisation accuracy training testing times order seconds minutes
We examine issue consistency new perspective To avoid overfitting training data considerable number current systems sacrificed goal learning hypotheses perfectly consistent training instances setting new goal hypothesis simplicity Occams razor Instead using simplicity goal developed novel approach addresses consistency directly In words concept learner explicit goal selecting appropriate degree consistency training data We begin paper exploring concept learning less perfect consistency Next describe system adapt degree consistency response feedback predictive accuracy test data Finally present results initial experiments begin address question tightly hypotheses fit training data different problems
Handling NP complete problems GAs great challenge In particular presence constraints makes finding solutions hard GA In paper present problem independent constraint handling mechanism Stepwise Adaptation Weights SAW apply solving SAT problem Our experiments prove SAW mechanism substantially increases GA performance Furthermore compare SAWing GA best heuristic technique could trace WGSAT conclude GA superior heuristic method
Artificial Neural Network seem promising regression classification especially large covariate spaces These methods represent nonlinear function composition low dimensional ridge functions therefore appear less sensitive dimensionality covariate space However due non uniqueness global minimum existence possibly many local minima model revealed network non stable We introduce method interpret neural network results uses novel robustification techniques This results robust interpretation model employed network Simulated data known models used demonstrate interpretability results demonstrate effects different regularization methods robustness model Graphical methods introduced present interpretation results We demonstrate interaction covariates revealed From study conclude interpretation method works well NN models may sometimes misinterpreted especially approximations true model less robust
In paper describe different ways select transform features using evolutionary computation The features intended serve inputs feedforward network The first way selection features using standard genetic algorithm solution found specifies whether certain feature present We show prediction unemployment rates various European countries succesfull approach In fact kind selection features special case socalled functional links Functional links transform input pattern space new pattern space As functional links one use polynomials general functions Both found using evolutionary computation Polynomial functional links found evolving coding powers polynomial For symbolic functions use genetic programming Genetic programming finds symbolic functions applied inputs We compare workings latter two methods two artificial datasets realworld medical image dataset
This paper reports development realistic knowledgebased application using MOBAL system Some problems requirements resulting industrialcaliber tasks formulated A stepbystep account construction knowledge base task demonstrates interleaved use several learning algorithms concert inference engine graphical interface fulfill requirements Design analysis revision refinement extension working model combined one incremental process This illustrates balanced cooperative modeling approach The case study taken telecommunications domain precisely deals security management telecommunications networks MOBAL would used part security management tool acquiring validating refining security policy The modeling approach compared approaches KADS standalone machine learning
Source separation consists recovering set independent signals mixtures unknown coefficients observed This paper introduces class adaptive algorithms source separation implements adaptive version equivariant estimation henceforth called EASI Equivariant Adaptive Separation via Independence The EASI algorithms based idea serial updating specific form matrix updates systematically yields algorithms simple parallelizable structure real complex mixtures Most importantly performance EASI algorithm depend mixing matrix In particular convergence rates stability conditions interference rejection levels depend normalized distributions source signals Close form expressions quantities given via asymptotic performance analysis This completed numerical experiments illustrating effectiveness proposed approach
Ensembles decision trees often exhibit greater predictive accuracy single trees alone Bagging boosting two standard ways generating combining multiple trees Boosting empirically determined effective two recently proposed may produces diverse trees bagging This paper reports empirical findings strongly support hypothesis We enforce greater decision tree diversity bagging simple modification underlying decision tree learner utilizes randomlygenerated decision stumps predefined depth starting point tree induction The modified procedure yields competitive results still retaining one attractive properties bagging iterations independent Additionally also investigate possible integration bagging boosting All ensemblegenerating procedures compared empirically various domains
ARCING THE EDGE Leo Breiman Technical Report Statistics Department University California Berkeley CA Abstract Recent work shown adaptively reweighting training set growing classifier using new weights combining classifiers constructed date significantly decrease generalization error Procedures type called arcing Breiman The first successful arcing procedure introduced Freund Schapire called Adaboost In effort explain Adaboost works Schapire etal derived bound generalization error convex combination classifiers terms margin We introduce function called edge differs margin two classes A framework understanding arcing algorithms defined In framework see arcing algorithms currently literature optimization algorithms minimize function edge A relation derived optimal reduction maximum value edge PAC concept weak learner Two algorithms described achieve optimal reduction Tests synthetic real data cast doubt Schapire etal There recent empirical evidence significant reductions generalization error gotten growing number different classifiers training set letting vote best class Freund Schapire proposed algorithm called AdaBoost adaptively reweights training set way based past history misclassifications constructs new classifier using current weights uses misclassification rate classifier determine size vote In number empirical studies many data sets using trees CART C base classifier Drucker Cortes Quinlan Freud Schapire Breiman AdaBoost produced dramatic decreases generalization error compared using single tree Error rates reduced point tests wellknown data sets gave result CART plus AdaBoost significantly better commonly used classification methods Breiman Meanwhile empirical results showed methods adaptive resampling reweighting combining called arcing Breiman also led low test set error rates An algorithm called arcx Breiman gave error rates almost identical Adaboost Ji Ma worked classifiers consisting randomly selected hyperplanes using different method adaptive resampling unweighted voting also got low error rates Thus least three arcing algorithms extant give excellent classification accuracy explanation
In order sequence tasks job shop problem JSP number machines related technological machine order jobs new representation technique mathematically known permutation repetition presented The main advantage single chromosome representation analogy permutation scheme traveling salesman problem TSP produce illegal sets operation sequences infeasible symbolic solutions As consequence representation scheme new crossover operator preserving initial scheme structure permutations repetition sketched Its behavior similar well known OrderCrossover simple permutation schemes Actually GOX operator permutations repetition arises Generalisation OX Computational experiments show GOX passes information couple parent solutions efficiently offspring solutions Together new representation GOX support cooperative aspect genetic search scheduling problems strongly
Recently Bell Sejnowski presented approach blind source separation based information maximization principle We extend approach general cases sources may delayed respect We present network architecture capable coping sources derive adaptation equations delays weights network maximizing information transferred network Examples using wideband sources speech presented illustrate algorithm
The reference class problem probability theory multiple inheritances extensions problem nonmonotonic logics referred special cases conflicting beliefs The current solution accepted two domains specificity priority principle By analyzing example several factors ignored principle found relevant priority reference class A new approach NonAxiomatic Reasoning System NARS discussed factors taken account It argued solution provided NARS better solutions provided probability theory nonmonotonic logics
This paper discusses application modern signal processing technique known independent component analysis ICA blind source separation multivariate financial time series portfolio stocks The key idea ICA linearly map observed multivariate time series new space statistically independent components ICs This viewed factorization portfolio since joint probabilities become simple products coordinate system ICs We apply ICA three years daily returns largest Japanese stocks compare results obtained using principal component analysis The results indicate estimated ICs fall two categories infrequent large shocks responsible major changes stock prices ii frequent smaller fluctuations contributing little overall level stocks We show overall stock price reconstructed surprisingly well using small number thresholded weighted ICs In contrast using shocks derived principal components instead independent components reconstructed price less similar original one Independent component analysis potentially powerful method analyzing understanding driving mechanisms financial markets There promising applications risk management since ICA focuses higher order statistics
This paper concerns empirical basis causation addresses following issues We propose minimalmodel semantics causation show contrary common folklore genuine causal influences distinguished spurious covariations following standard norms inductive reasoning We also establish sound characterization conditions distinction possible We provide effective algorithm inferred causation show large class data algorithm uncover direction causal influences defined Finally ad dress issue nontemporal causation
This paper presents method using qualitative models guide inductive learning Our objectives induce rules accurate also explainable respect qualitative model reduce learning time exploiting domain knowledge learning process Such explainability essential practical application inductive technology integrating results learning back existing knowledgebase We apply method two process control problems water tank network ore grinding process used mining industry Surprisingly addition achieving explainability classificational accuracy induced rules also increased We show value qualitative models quantified terms equivalence additional training examples finally discuss possible extensions
Explanation based learning typically considered symbolic learning method An explanation based learning method utilizes purely neural network representations called EBNN recently developed shown several desirable properties including robustness errors domain theory This paper briefly summarizes EBNN algorithm explores correspondence neural network based EBL method EBL methods based symbolic representations
The multiparent scanning crossover generalizing traditional uniform crossover diagonal crossover generalizing point npoint crossovers introduced In subsequent publications see several aspects multiparent recombination discussed Due space limitations however full overview experimental results showing performance multiparent GAs numerical optimization problems never published This technical report meant fill gap make results available
This report documents NACODAE Navy Conversational Decision Aids Environment developed Navy Center Applied Research Artificial Intelligence NCARAI branch Naval Research Laboratory NACODAE software prototype developed Practical Advances CaseBased Reasoning project funded Office Naval Research purpose assisting Navy DoD personnel decision aids tasks system maintenance operational training crisis response planning logistics fault diagnosis target classification meteorological nowcasting Implemented Java NACODAE used machine containing Java virtual machine eg PCs Unix This document describes exemplifies NACODAEs capabilities Our goal transition tool operational personnel continue enhancement user feedback testing recent research advances casebased reasoning related areas
A new generation sensor rich massively distributed autonomous systems developed potential unprecedented performance smart buildings reconfigurable factories adaptive traffic systems remote earth ecosystem monitoring To achieve high performance massive systems need accurately model environment sensor information Accomplishing grand scale requires automating art largescale modeling This paper presents formalization decompositional modelbased learning DML method developed observing modelers expertise decomposing large scale model estimation tasks The method exploits striking analogy learning consistencybased diagnosis Moriarty implementation DML applied thermal modeling smart building demonstrating significant improvement learning rate
Traditional machine vision assumes vision system recovers complete labeled description world Marr Recently several researchers criticized model proposed alternative model considers perception distributed collection taskspecific taskdriven visual routines Aloimonos Ullman Some researchers argued natural living systems visual routines product natural selection Ramachandran So far researchers handcoded taskspecific visual routines actual implementations eg Chapman In paper propose alternative approach visual routines simple tasks evolved using artificial evolution approach We present results series runs actual camera images simple routines evolved using Genetic Programming techniques Koza The results obtained promising evolved routines able correctly classify images better best algorithm able write hand
Combinatorial explosion inferences always central problem artificial intelligence Although inferences drawn reasoners knowledge available inputs large potentially infinite inferential resources available reasoning system limited With limited inferential capacity many potential inferences reasoners must somehow control process inference Not inferences equally useful given reasoning system Any reasoning system goals form utility function acts based beliefs indirectly assigns utility beliefs Given limits process inference variation utility inferences clear reasoner ought draw inferences valuable This paper presents approach problem makes utility potential belief explicit part inference process The method generate explicit desires knowledge The question focus attention thereby transformed two related problems How explicit desires knowledge used control inference facilitate resourceconstrained goal pursuit general Where desires knowledge come We present theory knowledge goals desires knowledge use processes understanding learning The theory illustrated using two case studies natural language understanding program learns reading novel unusual newspaper stories differential diagnosis program improves accuracy experience
This paper presents theory motivational analysis construction volitional explanations describe planning behavior agents We discuss content explanations well process understander builds explanations Explanations constructed decision models describe planning process agent goes considering whether perform action Decision models represented explanation patterns standard patterns causality based previous experiences understander We discuss nature explanation patterns use representing decision models process retrieved used evaluated
An evolutionary approach developing improved neural network architectures presented It shown possible use genetic algorithms construction backpropagation networks real world tasks Therefore network representation developed certain properties Results various application presented
This paper describes reasoner improve understanding incompletely understood domain application already knows novel problems domain Recent work AI dealt issue using past explanations stored reasoners memory understand novel situations However process assumes past explanations well understood provide good lessons used future situations This assumption usually false one learning novel domain since situations encountered previously domain might understood completely Instead reasonable assume reasoner would gaps knowledge base By reasoning new situation reasoner able fill gaps new information came reorganize explanations memory gradually evolve better understanding domain We present story understanding program retrieves past explanations situations already memory uses build explanations understand novel stories terrorism In system refines understanding domain filling gaps explanations elaborating explanations learning new indices explanations This type incremental learning since system improves explanatory knowledge domain incremental fashion rather learning new XPs whole
We present method analysis nonstationary time series multiple operating modes In particular possible detect model switching dynamics less abrupt time consuming drift one mode another This achieved two steps First unsupervised training method provides prediction experts inherent dynamical modes Then trained experts used hidden Markov model allows model drifts An application physiological wakesleep data demonstrates analysis modeling realworld time series improved drift paradigm taken account
addressed KBANN translates theory neuralnet refines using backpropagation retranslates result back rules adding extra hidden units connections initial network however would require predetermining num In paper presented constructive induction techniques recently added EITHER theory refinement system Intermediate concept utilization employs existing rules theory derive higherlevel features use induction Intermediate concept creation employs inverse resolution introduce new intermediate concepts order fill gaps theory span multiple levels These revisions allow EITHER make use imperfect domain theories ways typical previous work constructive induction theory refinement As result EITHER able handle wider range theory imperfections existing theory refinement system
A number reinforcement learning algorithms developed guaranteed converge optimal solution used lookup tables It shown however algorithms easily become unstable implemented directly general functionapproximation system sigmoidal multilayer perceptron radialbasisfunction system memorybased learning system even linear functionapproximation system A new class algorithms residual gradient algorithms proposed perform gradient descent mean squared Bellman residual guaranteeing convergence It shown however may learn slowly cases A larger class algorithms residual algorithms proposed guaranteed convergence residual gradient algorithms yet retain fast learning speed direct algorithms In fact direct residual gradient algorithms shown special cases residual algorithms shown residual algorithms combine advantages approach The direct residual gradient residual forms value iteration Qlearning advantage learning presented Theoretical analysis given explaining properties algorithms simulation results given demonstrate properties
This note describes useful adaptation peak seeking regime used unsupervised learning processes competitive learning kmeans The adaptation enables learning capture loworder probability effects thus fully capture probabilistic structure training data
Experiment design execution central activity natural sciences The SeqER system provides general architecture integration automated planning techniques variety domain knowledge order plan scientific experiments These planning techniques include rulebased methods especially use derivational analogy Derivational analogy allows planning experience captured cases reused Analogy also allows system function absence strong domain knowledge Cases efficiently flexibly retrieved large casebase using massively parallel methods
Finding good monitoring strategies important process design embedded agent We describe nature monitoring problem point makes difficult show periodic monitoring strategies often easiest derive always appropriate We demonstrate mathematically empirically wide class problems socalled cupcake problems exists simple strategy interval reduction outperforms periodic monitoring We also show features environment may influence choice optimal strategy The paper concludes thoughts monitoring strategy taxonomy defining features might
This paper discusses issues related Bayesian network model learning unbalanced binary classification tasks In general primary focus current research Bayesian network learning systems eg K variants creation Bayesian network structure fits database best It turns applied specific purpose mind classification performance network models may poor We demonstrate Bayesian network models created meet specific goal purpose intended model We first present goaloriented algorithm constructing Bayesian networks predicting uncollectibles telecommunications riskmanagement datasets Second argue demonstrate current Bayesian network learning methods may fail perform satisfactorily real life applications since learn models tailored specific goal purpose Third discuss performance goal oriented K variant
We calculated analytical expressions bias variance estimators provided various temporal difference value estimation algorithms change oine updates trials absorbing Markov chains using lookup table representations We illustrate classes learning curve behavior various chains show manner TD sensitive choice step size eligibility trace parameters
The problem minimizing number misclassified points plane attempting separate two point sets intersecting convex hulls ndimensional real space formulated linear program equilibrium constraints LPEC This general LPEC converted exact penalty problem quadratic objective linear constraints A FrankWolfetype algorithm proposed penalty problem terminates stationary point global solution Novel aspects approach include A linear complementarity formulation step function counts misclassifications ii Exact penalty formulation without boundedness nondegeneracy constraint qualification assumptions iii An exact solution extraction sequence minimizers penalty function finite value penalty parameter general LPEC explicitly exact solution LPEC uncoupled constraints iv A parametric quadratic programming formulation LPEC associated misclassification minimization problem
In paper introduce new approach problem optimal compression source code produces multiple codewords given symbol It may seem sensible codeword use case shortest one However proposed free energy approach random codeword selection yields effective codeword length less shortest codeword length If random choices Boltzmann distributed effective length optimal given source code The expectationmaximization parameter estimation algorithms minimize effective codeword length We illustrate performance free energy coding simple problem compression factor two gained using new method
As Bayesian Networks Influence Diagrams used widely importance efficient explanation mechanism becomes apparent We focus predictive explanations ones designed explain predictions recommendations probabilistic systems We analyze issues involved defining computing evaluating explanations present algorithm compute
Tech Report Department Computer Science Monash University Clayton Vic Australia Abstract This paper continues introduction minimum encoding inductive inference given Oliver Hand This series papers written objective providing introduction area statisticians We describe message length estimates used Wallaces Minimum Message Length MML inference Rissanens Minimum Description Length MDL inference The differences message length estimates two approaches explained The implications differences applications discussed
We determine study sustained performance CNS training evaluation large multilayered feedforward neural networks Using sophisticated coding node machine would achieve Giga connections per second GCPS Giga connection updates per second GCUPS During recall machine would archieve peak multiplyaccumulate performance The training large nets less efficient recall factor The benchmark parallelized machine code optimized analyzing performance Starting optimal parallel algorithm CNS specific optimizations still reduce run time factor recall factor training Our analysis also yields strategies code optimization The CNS still design therefore model run time behavior memory system interconnection network This gives us option changing parameters CNS system order analyze performance impact
In paper present interactive casebased approach crisis response provides users ability rapidly develop good responses allowing retain ultimate control decisionmaking process We implemented approach Inca INteractive Crisis Assistant planning scheduling crisis domains Inca relies casebased methods seed response development process initial candidate solutions drawn previous cases The human user interacts Inca adapt solutions current situation We discuss interactive approach crisis response using artificial hazardous materials domain HazMat developed purpose evaluating candidate assistant mechanisms crisis response
Mixedinitiative systems present challenge finding effective level interaction humans computers Machine learning presents promising approach problem form systems automatically adapt behavior accommodate different users In paper present empirical study learning user models adaptive assistant crisis scheduling We describe problem domain scheduling assistant present initial formulation adaptive assistants learning task results baseline study After report results three subsequent experiments investigate effects problem reformulation representation augmentation The results suggest problem reformulation leads significantly better accuracy without sacrificing usefulness learned behavior The studies also raise several interesting issues adaptive assistance scheduling
We consider Bayesian informationtheoretic approaches determining noninformative prior distributions parametric model family The informationtheoretic approaches based recently modified definition stochastic complexity Rissanen Minimum Message Length MML approach Wallace The Bayesian alternatives include uniform prior equivalent sample size priors In order able empirically compare different approaches practice methods instantiated model family practical importance family Bayesian networks
Intelligent information retrieval IIR requires inference The number inferences drawn even simple reasoner large inferential resources available practical computer system limited This problem one long faced AI researchers In paper present method used two recent machine learning programs control inference relevant design IIR systems The key feature approach use explicit representations desired knowledge call knowledge goals Our theory addresses representation knowledge goals methods generating transforming goals heuristics selecting among potential inferences order feasibly satisfy goals In view IIR becomes kind planning decisions infer infer infer based representations desired knowledge well internal representations systems inferential abilities current state The theory illustrated using two case studies natural language understanding program learns reading novel newspaper stories differential diagnosis program improves accuracy experience We conclude making several suggestions machine learning framework integrated existing information retrieval methods
This paper attempts bridge fields machine learning robotics distributed AI It discusses use communication reducing undesirable effects locality fully distributed multiagent systems multiple agentsrobots learning parallel interacting Two key problems hidden state credit assignment addressed applying local undirected broadcast communication dual role sensing reinforcement The methodology demonstrated two multirobot learning experiments The first describes learning tightlycoupled coordination task two robots second looselycoupled task four robots learning social rules Communication used share sensory data overcome hidden state share reinforcement overcome credit assignment problem agents bridge gap localindividual globalgroup payoff
This paper investigates power genetic algorithms solving MAXCLIQUE problem We measure performance standard genetic algorithm elementary set problem instances consisting embedded cliques random graphs We indicate need improvement introduce new genetic algorithm multiphase annealed GA exhibits superior performance problem set As scale problem size test hard benchmark instances notice degraded performance algorithm caused premature convergence local minima To alleviate problem sequence modifications implemented ranging changes input representation systematic local search The recent version called union GA incorporates features union crossover greedy replacement diversity enhancement It shows marked speedup number iterations required find given solution well improvement clique size found We discuss issues related SIMD implementation genetic algorithms Thinking Machines CM necessitated intrinsically high time complexity On serial algorithm computing one iteration Our preliminary conclusions genetic algorithm needs heavily customized work well clique problem GA computationally expensive use recommended known find larger cliques algorithms although customization effort bringing forth continued improvements clear evidence time GA better success circumventing local minima
For many types learners one compute statistically optimal way select data We review techniques used feedforward neural networks MacKay Cohn We show principles may used select data two alternative statisticallybased learning architectures mixtures Gaussians locally weighted regression While techniques neural networks expensive approximate techniques mixtures Gaussians locally weighted regression efficient accurate This report describes research done Center Biological Computational Learning Artificial Intelligence Laboratory Massachusetts Institute Technology Support Center provided part grant National Science Foundation contract ASC The authors also funded McDonnellPew Foundation ATR Human Information Processing Laboratories Siemens Corporate Research NSF grant CDA grant N Office Naval Research Michael I Jordan NSF Presidential Young Investigator A version paper appears G Tesauro D Touretzky J Alspector eds Advances Neural Information Processing Systems Morgan Kaufmann San Francisco CA
We consider standard problem learning concept random examples Here learning curve defined expected error learners hypotheses function training sample size Haussler Littlestone Warmuth shown distribution free setting smallest expected error learner achieve worst case concept class C converges rationally zero error ie fit training sample size However recently Cohn Tesauro demonstrated exponential convergence often observed experimental settings ie average error decreasing e fit By addressing simple nonuniformity original analysis paper shows dichotomy rational exponential worst case learning curves recovered distribution free theory These results support experimental findings Cohn Tesauro finite concept classes consistent learner achieves exponential convergence even worst case continuous concept classes learner exhibit subrational convergence every target concept domain distribution A precise boundary rational exponential convergence drawn simple concept chains Here show somewhere dense chains always force rational convergence worst case exponential convergence always achieved nowhere dense chains
Concepts learned neural networks difficult understand represented using large assemblages realvalued parameters One approach understanding trained neural networks extract symbolic rules describe classification behavior There several existing ruleextraction approaches operate searching rules We present novel method casts rule extraction search problem instead learning problem In addition learning training examples method exploits property networks efficiently queried We describe algorithms extracting conjunctive M ofN rules present experiments show method efficient conventional searchbased approaches
This paper presents fast algorithm provides optimal near optimal solutions minimum perimeter problem rectangular grid The minimum perimeter problem partition grid size M N P equal area regions minimizing total perimeter regions The approach taken divide grid stripes filled completely integer number regions This striping method gives rise knapsack integer program efficiently solved existing codes The solution knapsack problem used generate grid region assignments An implementation algorithm partitioned grid regions provably optimal solution less one second With sufficient memory hold M N grid array extremely large minimum perimeter problems solved easily
This paper presents evaluates two algorithms incrementally constructing Radial Basis Function Networks class neural networks looks suitable adtaptive control applications popular backpropagation networks The first algorithm derived previous method developed Fritzke second one inspired CART algorithm developed Breiman generation regression trees Both algorithms proved work well number tests exhibit comparable performances An evaluation standard case study MackeyGlass temporal series reported
number prototypes used represent class position prototype within class membership function associated prototype This paper proposes novel evolutionary approach data clustering classification overcomes many limitations traditional systems The approach rests optimisation number positions fuzzy prototypes using realvalued genetic algorithm GA Because GA acts classes system benefits naturally global information possible class interactions In addition concept receptive field prototype used replace classical distancebased membership function infinite fuzzy support multidimensional Gaussian function centred prototype unique variance dimension reflecting tightness cluster Hence notion nearestneighbour replaced nearest attracting prototype NAP The proposed model completely selfoptimising fuzzy system called GANAP Most data clustering algorithms including popular Kmeans algorithm require priori knowledge problem domain fix number starting positions prototypes Although knowledge may assumed domains whose dimensionality fairly small whose underlying structure relatively intuitive clearly much less accessible hyperdimensional settings number input parameters may large Classical systems also suffer fact define clusters one class time Hence account made potential interactions among classes These drawbacks compounded fact ensuing classification typically based fixed distancebased membership function prototypes This paper proposes novel approach data clustering classification overcomes aforementioned limitations traditional systems The model based genetic evolution fuzzy prototypes A realvalued genetic algorithm GA used optimise number positions prototypes Because GA acts classes measures fitness classification accuracy system naturally profits global information class interaction The concept receptive field prototype also presented used replace classical fixed distancebased function infinite fuzzy support membership function The new membership function inspired used hidden layer RBF networks It consists multidimensional Gaussian function centred prototype unique variance dimension reflects tightness cluster During classification notion nearestneighbour replaced nearest attracting prototype NAP The proposed model completely selfoptimising fuzzy system called GANAP
In paper study performance gradient descent applied problem online linear prediction arbitrary inner product spaces We prove worstcase bounds sum squared prediction errors various assumptions concerning amount priori information sequence predict The algorithms use variants extensions online gradient descent Whereas algorithms always predict using linear functions hypotheses none results requires data linearly related In fact bounds proved total prediction loss typically expressed function total loss best fixed linear predictor bounded norm All upper bounds tight within constants Matching lower bounds provided cases Finally apply results problem online prediction classes smooth functions
We study online learning classes functions single real variable formed bounds various norms functions derivatives We determine best bounds obtainable worstcase sum squared errors also absolute errors several classes We prove upper bounds classes smooth functions loss functions prove upper lower bounds terms number trials
Nearestneighbor algorithms known depend heavily distance metric In paper investigate use weighted Euclidean metric weight feature comes small set options We describe Diet algorithm directs search space discrete weights using crossvalidation error evaluation function Although large set possible weights reduce learners bias also lead increased variance overfitting Our empirical study shows many data sets advantage weighting features increasing number possible weights beyond two zero one little benefit sometimes degrades performance
In context machine learning examples paper deals problem estimating quality attributes without dependencies among Kira Rendell ab developed algorithm called RELIEF shown efficient estimating attributes Original RELIEF deal discrete continuous attributes limited twoclass problems In paper RELIEF analysed extended deal noisy incomplete multiclass data sets The extensions verified various artificial one well known realworld problem
In paper present averagecase analysis nearest neighbor algorithm simple induction method studied many researchers Our analysis assumes conjunctive target concept noisefree Boolean attributes uniform distribution instance space We calculate probability algorithm encounter test instance distance prototype concept along probability nearest stored training case distance e test instance From compute probability correct classification function number observed training cases number relevant attributes number irrelevant attributes We also explore behavioral implications analysis presenting predicted learning curves artificial domains give experimental results domains check reasoning
In order better understand life helpful look beyond envelop life know A simple model coevolution implemented addition genes longevity mutation rate individuals This made possible lineage evolve immortal It also allowed evolution mutation extremely high mutation rates The model shows individuals interact sort zerosum game lineages maintain relatively high mutation rates However individuals engage interactions greater consequences one individual interaction lineages tend evolve relatively low mutation rates This model suggests different genes may evolved different mutation rates adaptations varying pressures interactions genes
difficult We face problem using architecture based learning classifier systems description learning technique used organizational structure proposed present experiments show behaviour acquisition achieved Our simulated robot learns structural properties animal behavioural organization proposed ethologists After
In paper present probabilistic formalization instancebased learning approach In Bayesian framework moving construction explicit hypothesis datadriven instancebased learning approach equivalent averaging possibly infinitely many individual models The general Bayesian instancebased learning framework described paper applied set assumptions defining parametric model family discrete prediction task number simultaneously predicted attributes small includes example classification tasks prevalent machine learning literature To illustrate use suggested general framework practice show approach implemented special case strong independence assumptions underlying called Naive Bayes classifier The resulting Bayesian instancebased classifier validated empirically public domain data sets results compared performance traditional Naive Bayes classifier The results suggest Bayesian instancebased learning approach yields better results traditional Naive Bayes classifier especially cases amount training data small
We present comparative study genetic algorithms search properties treated combinatorial optimization technique This done context NPhard problem MAXSAT comparison relative Metropolis process extension simulated annealing Our contribution twofold First show large difficult MAXSAT instances contribution crossover search process marginal Little lost dispensed altogether running mutation selection enlarged Metropolis process Second show problem instances genetic search consistently performs worse simulated annealing subject similar resource bounds The correspondence two algorithms made precise via decomposition argument provides framework interpreting results
In constructive induction CI learners problem representation modified normal part learning process This may necessary initial representation inadequate inappropriate However distinction constructive nonconstructive methods appears highly ambiguous Several conventional definitions process constructive induction appear include conceivable learning processes In paper I argue process constructive learning identified relational learning ie I suggest
Probabilistic models recently utilized optimization large combinatorial search problems However complex probabilistic models attempt capture interparameter dependencies prohibitive computational costs The algorithm presented paper termed COMIT provides method using probabilistic models conjunction fast search techniques We show COMIT used two different fast search algorithms hillclimbing Populationbased incremental learning PBIL The resulting algorithms maintain many benefits probabilistic modeling far less computational expense Extensive empirical results provided COMIT successfully applied jobshop scheduling traveling salesman knapsack problems This paper also presents review probabilistic modeling combi natorial optimization
Current systems field Inductive Logic Programming ILP use primarily sake efficiency heuristically guided search techniques Such greedy algorithms suffer local optimization problem Present paper describes system named SFOIL tries alleviate problem using stochastic search method based generalization simulated annealing called Markovian neural network Various tests performed benchmark realworld domains The results show advantages weaknesses stochastic approach
In many optimization problems structure solutions reflects complex relationships different input parameters For example experience may tell us certain parameters closely related explored independently Similarly experience may establish subset parameters must take particular values Any search cost landscape take advantage relationships We present MIMIC framework analyze global structure optimization landscape A novel efficient algorithm estimation structure derived We use knowledge structure guide randomized search solution space turn refine estimate structure Our technique obtains significant speed gains randomized optimization procedures
We analyze generalization behavior XCS classifier system environments generalizations done Experimental results presented paper evidence generalization mechanism XCS prevent learning even simple tasks environments We present new operator named Specify contributes solution problem XCS Specify operator named XCSS compared XCS terms performance generalization capabilities different types environments Experimental results show XCSS deal greater variety environments robust XCS respect population size
In paper present computationally efficient method inducing selective Bayesian network classifiers Our approach use informationtheoretic metrics efficiently select subset attributes learn classifier We explore three conditional informationtheoretic metrics extensions metrics used extensively decision tree learning namely Quinlans gain gain ratio metrics Mantarass distance metric We experimentally show algorithms based gain ratio distance metric learn selective Bayesian networks predictive accuracies good better learned existing selective Bayesian network induction approaches KAS significantly lower computational cost We prove subsetselection phase informationbased algorithms polynomial complexity compared worstcase exponential time complexity corresponding phase KAS
The effectiveness casebased reasoning system known depend critically similarity measure However clear whether elusive esoteric similarity measures might improve performance casebased reasoner substituted commonly used measures This paper therefore deals problem choosing best similarity measure limited context instancebased learning classifications discrete example space We consider fixed similarity measures learnt ones In former case give definition similarity measure believe optimal wrt current prior distribution target concepts prove optimality within restricted class similarity measures We show optimal similarity measure instantiated specific prior distributions conclude simple similarity measure good cases In section show definition leads naturally conjecture
Multiarmed bandits may viewed decompositionallystructured Markov decision processes MDPs potentially large state sets A particularly elegant methodology computing optimal policies developed twenty ago Gittins Gittins Jones Gittins approach reduces problem finding optimal policies original MDP sequence lowdimensional stopping problems whose solutions determine optimal policy socalled Gittins indices Katehakis Veinott Katehakis Veinott shown Gittins index task state may interpreted particular component maximumvalue function associated restartini process simple MDP standard solution methods computing optimal policies successive approximation apply This paper explores problem learning Gittins indices online without aid process model suggests utilizing taskstatespecific Qlearning agents solve respective restartinstatei subproblems includes example online reinforcement learning approach applied simple problem stochastic schedulingone instance drawn wide class problems may formulated bandit problems
We analyze performance topdown algorithms decision tree learning employed widely used C CART software packages Our main result proof algorithms boosting algorithms By mean functions label internal nodes decision tree weakly approximate unknown target function topdown algorithms study amplify weak advantage build tree achieving desired level accuracy The bounds obtain amplification show interesting dependence splitting criterion used topdown algorithm More precisely functions used label internal nodes error fl approximations target function splitting criteria used CART C trees size Ofl Ologfl respectively suffice drive error Thus example small constant advantage random guessing amplified larger constant advantage trees constant size For new splitting criterion suggested analysis much stronger fl A preliminary version paper appears Proceedings TwentyEighth Annual ACM Symposium Theory Computing pages ACM Press Authors addresses M Kearns ATT Research Mountain Avenue Room A Murray Hill New Jersey electronic mail mkearnsresearchattcom Y Mansour Department Computer Science Tel Aviv University Tel Aviv Israel electronic mail mansourmathtauacil Y Mansour supported part Israel Science Foundation administered Israel Academy Science Humanities grant Israeli Ministry Science Technology
The paper describes counter example hypothesis states greedy decision tree generation algorithm constructs binary decision trees branches single attributevalue pair rather values selected attribute always lead tree fewer leaves given training set We show also RELIEFF less myopic impurity functions enables induction algorithm generates binary decision trees reconstruct optimal smallest decision trees cases
Realworld problems often difficult solved single monolithic system There many examples natural artificial systems show modular approach reduce total complexity system solving difficult problem satisfactorily The success modular artificial neural networks speech image processing typical example However designing modular system difficult task It relies heavily human experts prior knowledge problem There systematic automatic way form modular system problem This paper proposes novel evolutionary learning approach designing modular system automatically without human intervention Our starting point speciation using technique based fitness sharing While speciation genetic algorithms new effort made towards using speciated population complete modular system We harness specialized expertise species entire population rather single individual introducing gating algorithm We demonstrate approach automatic modularization improving coevolutionary game learning Following earlier researchers learn play iterated prisoners dilemma We review problems earlier coevolutionary learning explain poor generalization ability sudden mass extinctions The generalization ability approach significantly better past efforts Using specialized expertise entire speciated population though gating algorithm instead best individual main contributor improvement
In paper describe approach representing using improving sensory skills physical domains We present Icarus architecture represents control knowledge terms durative states sequences states The system operates cycles activating state matches environmental situation letting state control behavior conditions fail finding another matching state higher priority Information probability conditions remain satisfied minimizes demands sensing knowledge durations states likely successors Three statistical learning methods let system gradually reduce sensory load gains experience domain We report experimental evaluations ability three simulated physical tasks flying aircraft steering truck balancing pole Our experiments include lesion studies identify reduction sensing due learning mechanisms others examine effect domain characteristics
We follow Axelrod using genetic algorithm play Iterated Prisoners Dilemma Each member population ie strategy evaluated performs members current population This creates dynamic environment algorithm optimising moving target instead usual evaluation fixed set strategies causing arms race innovation We conduct two sets experiments The first set investigates conditions evolve best strategies The second set studies robustness strategies thus evolved strategies useful round robin population effective wide variety opponents Our results indicate population nearly always converged generations time bias population almost always stabilised Our results confirm cooperation almost always becomes dominant strategy We also confirm seeding population expert strategies best done small amounts leave initial population plenty genetic diversity The lack robustness strategies produced round robin evaluation demonstrated examples population nave cooperators exploited defectfirst strategy This causes sudden ephemeral decline populations average score recovers less nave cooperators emerge well exploiting strategies This example runaway evolution brought back reality suitable mutation reminiscent punctuated equilibria We find way reduce navity make GA population play extra
Unsupervised learning algorithms based convex conic encoders proposed The encoders find closest convex conic combination basis vectors input The learning algorithms produce basis vectors minimize reconstruction error encoders The convex algorithm develops locally linear models input conic algorithm discovers features Both algorithms used model handwritten digits compared vector quantization principal component analysis The neural network implementations involve feedback connections project reconstruction back input layer
Although recurrent neural nets moderately successful learning emulate finitestate machines FSMs continuous internal state dynamics neural net well matched discrete behavior FSM We describe architecture called DOLCE allows discrete states evolve net learning progresses dolce consists standard recurrent neural net trained gradient descent adaptive clustering technique quantizes state space dolce based assumption finite set discrete internal states required task actual network state belongs set corrupted noise due inaccuracy weights dolce learns recover discrete state maximum posteriori probability noisy state Simulations show dolce leads significant improvement generalization performance earlier neural net approaches FSM induction
We propose statistical mechanical framework modeling discrete time series Maximum likelihood estimation done via Boltzmann learning onedimensional networks tied weights We call networks Boltzmann chains show contain hidden Markov models HMMs special case Our framework also motivates new architectures address particular shortcomings HMMs We look two architectures parallel chains model feature sets disparate time scales looped networks model longterm dependencies hidden states For networks show implement Boltzmann learning rule exactly polynomial time without resort simulated meanfield annealing The necessary computations done exact decimation procedures statistical mechanics
Vector quantization lossy coding technique encoding set vectors different sources image speech The design vector quantizers yields lowest distortion one challenging problems field source coding However problem known difficult The conventional solution technique works process iterative refinements yield locally optimal results In paper design evaluate three versions genetic algorithms computing vector quantizers Our preliminary study GaussianMarkov sources showed genetic approach outperforms conventional technique cases
We discuss approach constructing composite features induction decision trees The composite features correspond mofn concepts There three goals research First explore family greedy methods building mofn concepts one GS described paper Second show concepts formed internal nodes decision trees serving bias learner Finally evaluate method several artificially generated naturally occurring data sets determine effects bias
We present new approach called First Order Regression FOR handling numerical information Inductive Logic Programming ILP FOR combination ILP numerical regression Firstorder logic descriptions induced carve subspaces amenable numerical regression among realvalued variables The program Fors implementation idea numerical regression focused distinguished continuous argument target predicate We show viewed generalisation usual ILP problem Applications Fors several realworld data sets described prediction mutagenicity chemicals modelling liquid dynamics surge tank predicting roughness steel grinding finite element mesh design operators skill reconstruction electric discharge machining A comparison Fors performance previous results domains indicates Fors effective tool ILP applications involve numerical data
Tech Report GITCOGSCI Abstract This paper identifies goal handling processes begin account kind processes involved invention We identify new kinds goals special properties mechanisms processing goals well means integrating opportunism deliberation social interaction goalplan processes We focus invention goals address significant enterprises associated inventor Invention goals represent seed goals expert around whole knowledge expert gets reorganized grows less opportunistically Invention goals reflect idiosyncrasy thematic goals among experts They constantly increase sensitivity individuals particular events might contribute satisfaction Our exploration based welldocumented example invention telephone Alexander Graham Bell We propose mechanisms explain Bells early thematic goals gave rise new goals invent multiple telegraph telephone new goals interacted opportunistically Finally describe computational model ALEC accounts role goals invention
In order better understand life helpful look beyond envelop life know A simple model coevolution implemented addition gene mutation rate individual This allowed mutation rate evolve lineage The model shows individuals interact sort zerosum game lineages maintain relatively high mutation rates However individuals engage interactions greater consequences one individual interaction lineages tend evolve relatively low mutation rates This model suggests different genes may evolved different mutation rates adaptations varying pressures interactions genes
In many learning tasks dataquery neither free constant cost Often cost query depends distance current location state space desired query point Much gained instances keeping track length shortest path state every first action take paths With information learning agent efficiently explore environment calculating every step action move towards region greatest estimated exploration benefit balancing exploration potential reachable states encountered far currently estimated distances
We examine representational capabilities firstorder secondorder Single Layer Recurrent Neural Networks SLRNNs hardlimiting neurons We show secondorder SLRNN strictly powerful firstorder SLRNN However firstorder SLRNN augmented output layers feedforward neurons implement finitestate recognizer statesplitting employed When state split divided two equivalent states The judicious use statesplitting allows efficient implementation finitestate recognizers using augmented firstorder SLRNNs
Learning past tense English verbs seemingly minor aspect language acquisition generated heated debates since become landmark task testing adequacy cognitive modeling Several artificial neural networks ANNs implemented challenge better symbolic models posed In paper present generalpurpose Symbolic Pattern Associator SPA based upon decisiontree learning algorithm ID We conduct extensive headtohead comparisons generalization ability ANN models SPA different representations We conclude SPA generalizes past tense unseen verbs better ANN models wide margin offer insights case We also discuss new default strategy decisiontree learning algorithms
As probabilistic systems gain popularity coming wider use need mechanism explains systems findings recommendations becomes critical The system also need mechanism ordering competing explanations We examine two representative approaches explanation literature one due G ardenfors one due Pearland show suffer significant problems We propose approach defining notion better explanation combines features together recent work Pearl others causality
A cooperative coevolutionary approach learning complex structures presented although preliminary nature appears number advantages noncoevolutionary approaches The cooperative coevolutionary approach encourages parallel evolution substructures interact useful ways form complex higher level structures The architecture designed general enough permit inclusion appropriate priori knowledge form initial biases towards particular kinds decompositions A brief summary initial results obtained testing architecture several problem domains presented shows significant speedup traditional noncoevolutionary approaches
An intelligent system must able adapt learn correct update model environment incrementally deliberately In complex environments many parameters interactions cost sampling possible range states test results action executions practical approach We present practical approach based continuous selective interaction environment pinpoints type fault domain knowledge causes unexpected behavior environment resorts experimentation additional information needed correct systems knowledge
Determining architecture neural network important issue learning task For recurrent neural networks general methods exist permit estimation number layers hidden neurons size layers number weights We present simple pruning heuristic significantly improves generalization performance trained recurrent networks We illustrate heuristic training fully recurrent neural network positive negative strings regular grammar We also show rules extracted networks trained recognize strings rules extracted pruning consistent rules learned This performance improvement obtained pruning retraining networks Simulations shown training pruning recurrent neural net strings generated two regular grammars randomlygenerated state grammar state triple parity grammar Further simulations indicate pruning method gives generalization performance superior obtained training weight decay
We investigate structure model selection problems via biasvariance decomposition In particular characterize essential structure model selection task bias variance profiles generates sequence hypothesis classes This leads new understanding complexitypenalization methods First penalty terms effect postulate particular profile variances function model complexity postulated true profiles match systematic underfitting overfitting results depending whether penalty terms large small Second usually best penalize according true variances task therefore fixed penalization strategy optimal across problems We use biasvariance characterization identify notion easy hard model selection problems In particular show variance profile grows rapidly relation biases standard model selection techniques become prone significant errors This happen example regression independent variables drawn widetailed distributions Finally discuss new model selection strategy dramatically outperforms standard complexitypenalization holdout methods hard tasks
We consider problem combine collection general regression fit vectors order obtain better predictive model The individual fits may subset linear regression ridge regression something complex like neural network We develop general framework problem examine recent crossvalidationbased proposal called stacking context Combination methods based bootstrap analytic methods also derived compared number examples including best subsets regression regression trees Finally apply ideas classification problems estimated combination weights yield insight structure problem
We compare performance explanation abilities several machine learning algorithms problem predicting femoral neck fracture recovery Among different algorithms semi naive Bayesian classifier AssistantR seem appropriate We analyze combination decisions several classifiers solving prediction problem show combined classifier improves performance explanation ability
Parallel Genetic Algorithms often reported yield better performance Genetic Algorithms use single large panmictic population In case Island Model Genetic Algorithm informally argued multiple subpopulations helps preserve genetic diversity since island potentially follow different search trajectory search space On hand linearly separable functions often used test Island Model Genetic Algorithms possible Island Models particular well suited separable problems We look Island Models track multiple search trajectories using infinite population models simple genetic algorithm We also introduce simple model better understanding Island Model Genetic Algorithms may advantage processing linearly separable problems
Bootstrap samples noise shown effective smoothness capacity control technique training feedforward networks statistical methods generalized additive models It shown noisy bootstrap performs best conjunction weight decay regularization ensemble averaging The twospiral problem highly nonlinear noisefree data used demonstrate findings The combination noisy bootstrap ensemble averaging also shown useful generalized additive modeling also demonstrated well known Cleveland Heart Data
New approaches prior specification structuring autoregressive time series models introduced developed We focus defining classes prior distributions parameters latent variables related latent components autoregressive model observed time series These new priors naturally permit incorporation qualitative quantitative prior information number relative importance physically meaningful components represent low frequency trends quasiperiodic subprocesses high frequency residual noise components observed series The class priors also naturally incorporates uncertainty model order hence leads posterior analysis model order assessment resulting posterior predictive inferences incorporate full uncertainties model order well model parameters Analysis also formally incorporates uncertainty leads inferences unknown initial values time series predictions future values Posterior analysis involves easily implemented iterative simulation methods developed described One motivating applied field climatology evaluation latent structure especially quasiperiodic structure critical importance connection issues global climatic variability We explore analysis data Southern Oscillation Index SOI one several series central recent highprofile debates atmospheric sciences recent apparent trends climatic indicators
Summary We detail illustrate time series analysis spectral inference autoregressive models focus underlying latent structure time series decompositions A novel class priors parameters latent components leads new class smoothness priors autoregressive coefficients provides formal inference model order including high order models leads incorporation uncertainty model order summary inferences The class prior models also allows subsets unit roots hence leads inference sustained though stochastically timevarying periodicities time series Applications analysis frequency composition time series time spectral domains illustrated study time series astronomy This analyses demonstrates impact utility new class priors addressing model order uncertainty allowing unit root structure Time domain decomposition time series estimated latent components provides important alternative view component spectral characteristics series In addition data analysis illustrates utility smoothness prior allowance unit root structure inference spectral densities In particular framework overcomes supposed problems spectral estimation autoregressive models using traditional model fitting methods
This paper presents new method training multilayer perceptron networks called DMP Dynamic Multilayer Perceptron The method based upon divide conquer approach builds networks form binary trees dynamically allocating nodes layers needed The individual nodes network trained using gentetic algorithm The method capable handling realvalued inputs proof given concerning convergence properties basic model Simulation results show DMP performs favorably comparison learning algorithms
NeuroDraughts draughts playing program similar approach NeuroGammon NeuroChess Tesauro Thrun It uses artificial neural network trained method temporal difference learning learn selfplay play game draughts This paper discusses relative contribution board representation search depth training regime architecture run time parameters strength TDplayer produced system Keywords Temporal Difference Learning Input representation Search Draughts
In last decade research Machine Learning developed variety powerful tools inductive learning data analysis On hand research International Relations developed variety different conflict databases mostly analyzed classical statistical methods As databases general symbolic nature provide interesting domain application Machine Learning algorithms This paper gives short overview available conflict databases subsequently concentrates application machine learning methods analysis interpretation databases
In survey review work machine learning methods handling data sets containing large amounts irrelevant information We focus two key issues problem selecting relevant features problem selecting relevant examples We describe advances made topics empirical theoretical work machine learning present general framework use compare different methods We close challenges future work area
We describe illustrate Bayesian approaches modelling analysis multiple nonstationary time series This begins univariate models collections related time series assumedly driven underlying unobservable processes referred dynamic latent factor processes We focus models factor processes hence observed time series modelled timevarying autoregressions capable flexibly representing ranges observed nonstationary characteristics We highlight concepts new methods time series decomposition infer characteristics latent components time series relate univariate decomposition analyses underlying multivariate dynamic factor structure Our motivating application analysis multiple EEG traces ongoing EEG study Duke In study individuals undergoing ECT therapy generate multiple EEG traces various scalp locations physiological interest lies identifying dependencies dissimilarities across series In addition multivariate nonstationary aspects series area provides illustration new results decomposition time series latent physically interpretable components illustrated data analysis one EEG data set The paper also discusses current future research directions fl This research supported part National Science Foundation grant DMS The EEG data context arose discussions Dr Andrew Krystal Duke University Medical Center continued interactions valuable Address correspondence Institute Statistics Decision Sciences Duke University Durham NC USA httpwwwstatdukeedu
The subsumption problem crucial efficiency ILP learning systems We discuss two subsumption algorithms based strategies preselecting suitable matching literals The class clauses subsumption becomes polynomial superset deterministic clauses We map general problem subsumption certain problem finding clique fixed size graph return show specialization pruning strategy Carraghan Pardalos clique algorithm provides dramatic reduction subsumption search space We also present empirical results mesh design data set
Casebased planning involves storing individual instances problemsolving episodes using tackle new planning problems This paper concerned derivation replay main component form casebased planning called derivational analogy DA Prior study implementations derivation replay based within statespace planning We motivated acknowledged superiority partialorder PO planners plan generation Here demonstrate planspace planning also advantage replay We argue decoupling planning derivation order execution order plan steps provided partialorder planners enables exploit guidance previous cases efficient straightforward fashion We validate hypothesis focused empirical comparison
It wellknown fact propositional learning algorithms require good features perform well practice So major step data engineering inductive learning construction good features domain experts These features often represent properties structured objects property typically occurrence certain substructure certain properties To partly automate process feature engineering devised algorithm searches features defined substructures The algorithm stochastically conducts topdown search firstorder clauses clause represents binary feature It differs existing algorithms search classblind capable considering clauses context almost arbitrary length size Preliminary experiments favorable support view approach promising
A neural network trend predictor gold bullion market presented A simple recurrent neural network trained recognize turning points gold market based todate history ten market indices The network tested data held back training significant amount predictive power observed The turning point predictions used time transactions gold bullion gold mining company stock index markets obtain significant paper profit test period The training data consisted daily closing prices ten input markets period five years The turning point targets labeled training phase without help financial expert Thus experiment shows useful predictions made without use extensive market data knowledge
Automating learning causal models sample data key step toward incorporating machine learning automation decisionmaking reasoning uncertainty This paper presents Bayesian approach discovery causal models using Minimum Message Length MML method We developed encoding search methods discovering linear causal models The initial experimental results presented paper show MML induction approach recover causal models generated data quite accurate reflections original models results compare favorably TETRAD II program Spirtes et al even algorithm supplied prior temporal information MML
We present new algorithm associative reinforcement learning The algorithm based upon idea matching networks output probability probability distribution derived environments reward signal This Probability Matching algorithm shown perform faster less susceptible local minima previously existing algorithms We use Probability Matching train mixture experts networks architecture reinforcement learning rules fail converge reliably even simple problems This architecture particularly well suited algorithm compute arbitrarily complex functions yet calculation output probability simple
Many casebased reasoning algorithms retrieve cases using derivative knearest neighbor kNN classifier whose similarity function sensitive irrelevant interacting noisy features Many proposed methods reducing sensitivity parameterize kNNs similarity function feature weights We focus methods automatically assign weight settings using little domainspecific knowledge Our goal predict relative capabilities methods specific dataset characteristics We introduce fivedimensional framework categorizes automated weightsetting methods empirically compare methods along one dimensions summarize results four hypotheses describe additional evidence supports Our investigation revealed methods correctly assign low weights completely irrelevant features methods use performance feedback demonstrate three advantages methods ie require less preprocessing better tolerate interacting features crease learning rate
This paper deals problem learning characteristic concept descriptions examples describes new generalization approach implemented system Cola The approach tries take advantage information induced descriptions unclassified objects using conceptual clustering algorithm Experimental results various realworld domains strongly support hypothesis new approach delivers correct possibly comprehesible concept descriptions exisiting methods induced concept descriptions also used classify objects belong concepts present training data set This paper describes generalization approach implemented Cola presents experimental results obtained relational propositional real world data set
Local selection LS simple selection scheme evolutionary algorithms Individual fitnesses compared fixed threshold rather decide gets reproduce LS coupled fitness functions stemming consumption shared environmental resources maintains diversity way similar fitness sharing however generally efficient fitness sharing lends parallel implementations distributed tasks While LS prone premature convergence applies minimal selection pressure upon population LS therefore appropriate stronger selection schemes certain problem classes This papers characterizes one broad class problems LS consistently performs tournament selection
The potential combined ERSJERS SAR images land cover classification demonstrated Raco test site Michigan recent papers articles Our goal develop classification algorithm stable terms applicability different geographical regions Unlike optical remote sensing techniques radar remote sensing provide calibrated data image signal solely determined physical structural electrical properties targets Earths surface near subsurface Hence classifier based radar signatures object classes applicable new calibrated images without need train classifier This article discusses design applicability classification algorithm based calibrated radar signatures measured ERS Cband vv polarized JERS Lband hh polarized SAR image data The applicability compared two different test sites Raco Michigan Cedar Creek LTER site Minnesota It found classes separate well certain boundary conditions like comparable seasonality soil moisture conditions observed
Presenting Analyzing Results AI Experiments Data Averaging Data Snooping Proceedings Fourteenth National Conference Artificial Intelligence AAAI AAAI Press Menlo Park California pp Copyright AAAI Presenting Analyzing Results AI Experiments Abstract Experimental results reported machine learning AI literature misleading This paper investigates common processes data averaging reporting results terms mean standard deviation results multiple trials data snooping context neural networks one popular AI machine learning models Both processes result misleading results inaccurate conclusions We demonstrate easily happen propose techniques avoiding important problems For data averaging common presentation assumes distribution individual results Gaussian However investigate distribution common problems find often approximate Gaussian distribution may symmetric may multimodal We show assuming Gaussian distributions significantly affect interpretation results especially comparison studies For controlled task find distribution performance skewed towards better performance smoother target functions skewed towards worse performance complex target functions We propose new guidelines reporting performance provide information actual distribution eg boxwhiskers plots For data snooping demonstrate optimization performance via experimentation multiple parameters lead significance assigned results due chance We suggest precise descriptions experimental techniques important evaluation results need aware potential data snooping biases formulating experimental techniques eg selecting test procedure Additionally important rely appropriate statistical tests ensure assumptions made tests valid eg normality distribution
A brief survey biological research noncoding DNA presented There growing interest effects noncoding segments evolutionary algorithms EAs To better understand conduct research noncoding segments EAs important understand biological background work This paper begins review basic genetics terminology describes different types noncoding DNA surveys recent intron research
We use simulated soccer study multiagent learning Each teams players agents share action set policy may behave differently due positiondependent inputs All agents making team rewarded punished collectively case goals We conduct simulations varying team sizes compare two learning algorithms TDQ learning linear neural networks TDQ Probabilistic Incremental Program Evolution PIPE TDQ based evaluation functions EFs mapping inputaction pairs expected reward PIPE searches policy space directly PIPE uses adaptive probabilistic prototype trees synthesize programs calculate action probabilities current inputs Our results show TDQ encounters several difficulties learning appropriate shared EFs PIPE however depend EFs find good policies faster reliably This suggests multiagent learning scenarios direct search policy space offer advantages EFbased approaches
A novel supervised learning method presented combining linear discriminant functions neural networks The proposed method results treestructured hybrid architecture Due constructive learning binary tree hierarchical architecture automatically generated controlled growing process specific supervised learning task Unlike classic decision tree linear discriminant functions merely employed intermediate level tree heuristically partitioning large complicated task several smaller simpler subtasks proposed method These subtasks dealt component neural networks leaves tree accordingly For constructive learning growing creditassignment algorithms developed serve hybrid architecture The proposed architecture provides efficient way apply existing neural networks eg multilayered perceptron solving large scale problem We already applied proposed method universal approximation problem several benchmark classification problems order evaluate performance Simulation results shown proposed method yields better results faster training comparison multilayered perceptron
Machine learning knowledge engineering always strongly related introduction new representations knowledge engineering created gap This paper describes research aimed applying machine learning techniques current knowledge engineering representations We propose system redesigns part knowledge based system called control knowledge We claim strong similarity redesign knowledge based systems incremental machine learning Finally relate work existing research
Often learning data one attaches penalty term standard error term attempt prefer simple models prevent overfitting Current penalty terms neural networks however often take account weight interaction This critical drawback since effective number parameters network usually differs dramatically total number possible parameters In paper present penalty term uses Principal Component Analysis help detect functional redundancy neural network Results show new algorithm gives much accurate estimate network complexity standard approaches As result new term able improve techniques make use penalty term weight decay weight pruning feature selection Bayesian predictionrisk tech niques
The DMP Dynamic Multilayer Perceptron network training method based upon divide conquer approach builds networks form binary trees dynamically allocating nodes layers needed This paper introduces DMP method compares preformance DMP using standard delta rule training method training individual nodes performance DMP using genetic algorithm training While basic model require use genetic algorithm training individual nodes results show convergence properties DMP enhanced use genetic algorithm appropriate fitness function
In late developed one early casebased design systems called Kritik Kritik autonomously generated preliminary conceptual qualitative designs physical devices retrieving adapting past designs stored case memory Each case system associated structurebehaviorfunction SBF device model explained structure device accomplished functions These casespecific device models guided process modifying past design meet functional specification new design problem The device models also enabled verification design modifications Kritik new complete implementation Kritik In paper take retrospective view Kritik In early papers described Kritik integrating casebased modelbased reasoning In integration Kritik also grounds computational process casebased reasoning SBF content theory device comprehension The SBF models provide methods many specific tasks casebased design design adaptation verification also provide vocabulary whole process casebased design retrieval old cases storage new ones This grounding believe essential building wellconstrained theories casebased design
Much current research learning Bayesian Networks fails effectively deal missing data Most methods assume data complete make data complete using fairly adhoc methods methods deal missing data learn conditional probabilities assuming structure known We present principled approach learn Bayesian network structure well conditional probabilities incomplete data The proposed algorithm iterative method uses combination ExpectationMaximization EM Imputation techniques Results presented synthetic data sets show performance new algorithm much better adhoc methods handling missing data
Researchers field Distributed Artificial Intelligence DAI developing efficient mechanisms coordinate activities multiple autonomous agents The need coordination arises agents share resources expertise required achieve goals Previous work area includes using sophisticated information exchange protocols investigating heuristics negotiation developing formal models possibilities conflict cooperation among agent interests In order handle changing requirements continuous dynamic environments propose learning means provide additional possibilities effective coordination We use reinforcement learning techniques block pushing problem show agents learn complimentary policies follow desired path without knowledge We theoretically analyze experimentally verify effects learning rate system convergence demonstrate benefits using learned coordination knowledge similar problems Reinforcement learning based coordination achieved cooperative noncooperative domains domains noisy communication channels stochastic characteristics present formidable challenge using coordination schemes
The performance error backpropagation BP ID learning algorithms compared task mapping English text phonemes stresses Under distributed output code developed Sejnowski Rosenberg shown BP consistently outperforms ID task several percentage points Three hypotheses explaining difference explored ID overfitting training data b BP able share hidden units across several output units hence learn output units better c BP captures statistical information ID We conclude hypothesis c correct By augmenting ID simple statistical learning procedure performance BP approached matched More complex statistical procedures improve performance BP ID substantially A study residual errors suggests still substantial room improvement learning methods texttospeech mapping
We thank Steen Ladegaard Knudsen assistance programming analysis running simulations Scott Baden assistance vectorizing code Cray YMP Division Engineering Block Grant time Cray San Diego Supercomputer Center members PDPNLP GURU Research Groups UCSD helpful comments earlier versions work
There lot recent interest socalled steady state genetic algorithms GAs among things replace individuals typically generation fixed size population size N Understanding advantages andor disadvantages replacing fraction population generation rather entire population goal earliest GA research In spite considerable progress understanding GAs since proscons overlapping generations remains somewhat cloudy issue However recent theoretical empirical results provide background much clearer understanding issue In paper review combine extend results way significantly sharpens insight
Daily experience shows real world meaning many concepts heavily depends implicit context changes context cause less radical changes concepts Incremental concept learning domains requires ability recognize adapt changes This paper presents solution incremental learning tasks domain provides explicit clues current context eg attributes characteristic values We present general twolevel learning model realization system named MetaLB learn detect certain types contextual clues react accordingly context change suspected The model consists base level learner performs regular online learning classification task metalearner identifies potential contextual clues Context learning detection occur regular online learning without separate training phases context recognition Experiments synthetic domains well realworld problem show MetaLB robust variety dimensions produces substantial improvement simple objectlevel learning situations changing contexts
This paper investigates memory issues influence long term creative problem solving design activity taking casebased reasoning perspective Our exploration based welldocumented example invention telephone Alexander Graham Bell We abstract Bells reasoning understanding mechanisms appear time longterm creative design We identify understanding mechanism responsible analogical anticipation design constraints analogical evaluation beside casebased design But already understood design satisfy opportunistically suspended design problems still active background The new mechanisms integrated computational model ALEC accounts creative
Intelligent human agents exist cooperative social environment facilitates learning They learn trialanderror also cooperation sharing instantaneous information episodic experience learned knowledge The key investigations paper Given number reinforcement learning agents cooperative agents outperform independent agents communicate learning What price cooperation Using independent agents benchmark cooperative agents studied following ways sharing sensation sharing episodes sharing learned policies This paper shows additional sensation another agent beneficial used efficiently b sharing learned policies episodes among agents speeds learning cost communication c joint tasks agents engaging partnership significantly outperform independent agents although may learn slowly beginning These tradeoffs limited multiagent reinforcement learning
We describe SFOIL descendant FOIL uses advanced stochastic search heuristic application learning compose twovoice counterpoint The application required learning ary relation training instances SFOIL able efficiently deal learning task knowledge one complex learning task solved ILP system This demonstrates ILP systems scale real databases topdown ILP systems use covering approach advanced search strategies appropriate knowledge discovery databases promising investigation
The human visual system sensitive relative motion objects absolute motion An understanding motion perception requires understanding neural circuits group moving visual elements relative one another based upon hierarchical reference frames We modeled visual relative motion perception using neural network architecture groups visual elements according Gestalt commonfate principles exploits information behavior group predict behavior individual elements A simple competitive neural circuit binds visual elements together representation visual object Information spiking pattern neurons allows transfer bindings object representation location location neural circuit object moves The model exhibits characteristics human object grouping solves key neural circuit design problems visual relative motion perception
A knowledgelevel analysis complex tasks like diagnosis design give us better understanding tasks terms goals aim achieve different ways achieve goals In paper present knowledgelevel analysis redesign Redesign viewed family methods based common principles number dimensions along redesign problem solving methods vary distinguished By examining problemsolving behavior number existing redesign systems approaches came collection problemsolving methods redesign developed taskmethod structure redesign In constructing system redesign large number knowledgerelated choices decisions made In order describe relevant choices redesign problem solving extend current notion possible relations tasks methods PSM architecture The realization task problemsolving method decomposition problemsolving method subtasks common relations PSM architecture However suggest extend relations notions task refinement method refinement These notions represent intermediate decisions taskmethod structure competence task method refined without immediately paying attention operationalization terms subtasks Explicit representation kind intermediate decisions helps make represent decisions piecemeal fashion
In Bayesian density estimation prediction using Dirichlet process mixtures standard exponential family distributions precision total mass parameter mixing Dirichlet process critical hyperparameter strongly influences resulting inferences numbers mixture components This note shows respect flexible class prior distributions parameter posterior may represented simple conditional form easily simulated As result inference key quantity may developed tandem existing routine Gibbs sampling algorithms fitting mixture models The concept data augmentation important ever developing extension existing algorithm A final section notes simple asymptotic approx imation posterior
In paper describe study applying knowledgebased neural networks problem diagnosing faults local telephone loops Currently NYNEX uses expert system called MAX aid human experts diagnosing faults however effective learning algorithm place MAX would allow easy portability different maintenance centers easy updating phone equipment changes We find machine learning algorithms better accuracy MAX ii neural networks perform better decision trees iii neural network ensembles perform better standard neural networks iv knowledgebased neural networks perform better standard neural networks v ensemble knowledgebased neural networks performs best
Human vision systems integrate information nonlocally across long spatial ranges For example moving stimulus appears smeared viewed briefly ms yet sharp viewed longer exposure ms Burr This suggests visual systems combine information along trajectory matches motion stimulus Our selforganizing neural network model shows developmental exposure moving stimuli direct formation horizontal trajectoryspecific motion integration pathways unsmear representations moving stimuli These results account Burrs data potentially also model phenomena visual inertia
Explaining away common pattern reasoning confirmation one cause observed believed event reduces need invoke alternative causes The opposite explaining away also occur confirmation one cause increases belief another We provide general qualitative probabilistic analysis intercausal reasoning identify property interaction among causes product synergy determines form reasoning appropriate Product synergy extends qualitative probabilistic network QPN formalism support qualitative intercausal inference directions change probabilistic belief The intercausal relation also justifies Occams razor facilitating pruning search likely diagnoses Portions paper originally appeared Proceedings Second International Conference Principles Knowledge Representation Reasoning Supported National Science Foundation grant IRI Carnegie Mellon Rockwell International Science Center
We introduce new technique enables learner without access hidden information learn nearly well learner access hidden information We apply technique solve open problem Maass Turan showing concept class F least number queries sufficient learning F algorithm access arbitrary equivalence queries factor log least number queries sufficient learning F algorithm access arbitrary equivalence queries membership queries Previously known results imply log bound best possible We describe analogous results two generalizations model function learning apply results bound difficulty learning harder models terms difficulty learning easier model We bound difficulty learning unions k concepts class F terms difficulty learning F We bound difficulty learning noisy environment deterministic algorithms terms difficulty learning noisefree environment We apply variant technique develop algorithm transformation allows probabilistic learning algorithms nearly optimally cope noise A second variant enables us improve general lower bound Turan PAClearning model queries Finally show logarithmically many membership queries never help obtain computationally efficient learning algorithms fl Supported Air Force Office Scientific Research grant FJ Most work done author TU Graz supported Lise Meitner Fellowship Fonds zur Forderung der wissenschaftlichen Forschung Austria
In paper describe evolution discretetime recurrent neural network control real mobile robot In experiments evolutionary procedure carried entirely physical robot without human intervention We show autonomous development set behaviors locating battery charger periodically returning achieved lifting constraints design robotenvironment interactions employed preliminary experiment The emergent homing behavior based autonomous development internal neural topographic map predesigned allows robot choose appropriate trajectory function location remaining energy
Genetic programming methodology program development consisting special form genetic algorithm capable handling parse trees representing programs successfully applied variety problems In paper new approach construction neural networks based genetic programming presented A linear chromosome combined graph representation network new operators introduced allow evolution architecture weights simultaneously without need local weight optimization This paper describes approach operators reports results application model several binary classification problems
Conceptual analogy CA general approach applies conceptual clustering concept representations facilitate efficient use past experiences cases analogical reasoning Borner The approach developed implemented SYN see also Borner Borner Faauer support design supply nets building engineering This paper sketches task outlines nearestneighborbased agglomerative conceptual clustering applied organizing large amounts structured cases case classes provides concept representation used characterize case classes shows analogous solution new problems based concepts available However main purpose paper evaluate CA terms reasoning efficiency capability derive solutions go beyond cases case base still preserve quality cases
Accurate fast estimation probability density functions crucial satisfactory computational performance many scientific problems When type density known priori problem becomes statistical estimation parameters observed values In nonparametric case usual estimators make use kernel functions If X j j n sequence iid random variables estimated probability density function f n kernel method computation values f n X f n X f n X n requires On operations since kernel needs evaluated every X j We propose sequence special weight functions nonparametric estimation f requires almost linear time slowly growing function increases without bound n method requires Om n arithmetic operations We derive conditions convergence number metrics turn similar required convergence kernel based methods We also discuss experiments different distributions compare efficiency accuracy computations kernel based estimators various values n
We propose active learning method hiddenunit reduction devised specially multilayer perceptrons MLP First review active learning method point many Fisherinformationbased methods applied MLP critical problem information matrix may singular To solve problem derive singularity condition information matrix propose active learning technique applicable
Stable neural network control estimation may viewed formally merging concepts nonlinear dynamic systems theory tools multivariate approximation theory This paper extends earlier results adaptive control estimation nonlinear systems using gaussian radial basis functions online generation irregularly sampled networks using tools multiresolution analysis wavelet theory This yields much compact efficient system representations preserving global closedloop stability Approximation models employing basis functions localized space spatial frequency admit measure approximated functions spatial frequency content directly dependent reconstruction error As result models afford means adaptively selecting basis functions according local spatial frequency content approximated function An algorithm stable online adaptation output weights simultaneously node configuration class nonparametric models wavelet basis functions presented An asymptotic bound error networks reconstruction derived shown dependent solely minimum approximation error associated steady state node configuration In addition prior bounds temporal bandwidth system identified controlled used develop criterion online selection radial ridge wavelet basis functions thus reducing rate increase networks size dimension state vector Experimental results obtained using network predict path unknown light bluff object thrown air activevision based robotic catching system given illustrate networks performance simple realtime application
Research bias machine learning algorithms generally concerned impact bias predictive accuracy We believe factors also play role evaluation bias One factor stability algorithm words repeatability results If obtain two sets data phenomenon underlying probability distribution would like learning algorithm induce approximately concepts sets data This paper introduces method quantifying stability based measure agreement concepts We also discuss relationships among stability predictive accuracy bias
In many Genetic Algorithms applications objective find nearoptimal solution using limited amount computation Given requirements difficult find good balance exploration exploitation Usually balance found tuning various parameters like selective pressure population size mutation crossover rate Genetic Algorithm As alternative propose simultaneous tuning selective pressure disruptiveness recombination operators Our experiments show combination proper selective pressure highly disruptive recombination operator yields superior performance The reduction mechanism used SteadyState GA strong influence optimal crossover disruptiveness Using worst fitness deletion strategy building blocks present current best individuals always preserved This releases crossover operator burden maintain good building blocks allows us tune crossover disruptiveness improve search better individuals
Crossvalidation frequently used intuitively pleasing technique estimating accuracy theories learned machine learning algorithms During testing machine learning algorithm foil new databases prokaryotic RNA transcription promoters developed crossvalidation displayed interesting phenomenon One theory found repeatedly responsible little crossvalidation error whereas theories found infrequently tend responsible majority crossvalidation error It tempting believe frequently found theory modal theory may accurate classifier unseen data theories However experiments showed modal theories accurate unseen data theories found less frequently crossvalidation Modal theories may useful predicting crossvalidation poor estimate true accuracy We offer explanations For correspondence Department Computer Science Engineering University California San
One significant cost factors robotics applications design development realtime robot control software Control theory helps linear controllers developed doesnt sufficiently support generation nonlinear controllers although many cases compliance control nonlinear control essential achieving high performance This paper discusses Machine Learning applied design nonlinear controllers Several alternative function approximators including Multilayer Perceptrons MLP Radial Basis Function Networks RBFNs Fuzzy Controllers analyzed compared leading definition two major families Open Field Function Function Approximators Locally Receptive Field Function Approximators It shown RBFNs Fuzzy Controllers bear strong similarities symbolic interpretation This characteristics allows applying symbolic statistic learning algorithms synthesize network layout set examples possibly background knowledge Three integrated learning algorithms two original described evaluated experimental test cases The first test case provided robot KUKA IR engaged pegintohole task whereas second represented classical prediction task MackeyGlass time series From experimental comparison appears Fuzzy Controllers RBFNs synthesised examples excellent approximators practice even accurate MLPs
This paper discusses use evolutionary computation evolve behaviors exhibit emergent intelligent behavior Genetic algorithms used learn navigation collision avoidance behaviors robots The learning performed simulation resulting behaviors used control actual robot Some emergent behavior described detail
Judgments similarity soundness important aspects human analogical processing This paper explores judgments modeled using SME simulation Gentners structuremapping theory We focus structural evaluation explicating several principles psychologically plausible algorithms follow We introduce Specificity Conjecture claims naturalistic representations include preponderance appearance loworder information We demonstrate via computational experiments conjecture affects structural evaluation performed including choice normalization technique systematicity preference implemented
Genetic algorithms used solve hard optimization problems ranging Travelling Salesman problem Quadratic Assignment problem We show Simple Genetic Algorithm used solve optimization problem derived Conjunctive Normal Form problem By separating populations small subpopulations parallel genetic algorithms exploits inherent parallelism genetic algorithms prevents premature convergence Genetic algorithms using hillclimbing conduct genetic search space local optima hillclimbing less computationally expensive genetic search We examine effectiveness techniques improving quality solutions CNF problems
ICSIM simulator structured connectionism development ICSI Structured connectionism characterized need flexibility efficiency support design reuse modular substructure We take position fast objectoriented language like Sather appropriate implementation medium achieve goals The core ICSIM consists hierarchy classes correspond simulation entities New connectionist models realized combining specializing preexisting classes Whenever possible auxillary functionality separated functional modules order keep basic hierarchy clean simple possible
In recent years researchers made considerable progress worstcase analysis inductive learning tasks theoretical results impact practice must deal average case In paper present averagecase analysis simple algorithm induces onelevel decision trees concepts defined single relevant attribute Given knowledge number training instances number irrelevant attributes amount class attribute noise class attribute distributions derive expected classification accuracy entire instance space We examine predictions analysis different settings domain parameters comparing exper imental results check reasoning
We compare performance several machine learning algorithms problem prognostics femoral neck fracture recovery Knearest neighbours algorithm seminaive Bayesian classifier backpropagation weight elimination learning multilayered neural networks LFC lookahead feature construction algorithm AssistantI AssistantR algorithms top induction decision trees using information gain RELIEFF search heuristics respectively We compare prognostic accuracy explanation ability different classifiers Among different algorithms seminaive Bayesian classifier AssistantR seem appropriate We analyze combination decisions several classifiers solving prediction problems show combined classifier improves performance explanation ability
The StructureMapping Engine SME successfully modeled several aspects human consistent interpretations analogy While useful theoretical explorations aspect algorithm psychologically implausible computationally inefficient SME contains mechanism focusing interpretations relevant analogizers goals This paper describes modifications SME overcome flaws We describe greedy merge algorithm efficiently computes approximate best interpretation generate alternate interpretations necessary We describe pragmatic marking technique focuses mapping produce relevant yet novel inferences We illustrate techniques via example evaluate performance using empirical data theoretical analysis analogical processing However two significant drawbacks SME constructs structurally
Donoho Johnstones WaveShrink procedure proven valuable signal denoising nonparametric regression WaveShrink based principle shrinking wavelet coefficients towards zero remove noise WaveShrink broad asymptotic nearoptimality properties In paper introduce new shrinkage scheme semisoft generalizes hard soft shrinkage We study properties shrinkage functions demonstrate semisoft shrinkage offers advantages hard shrinkage uniformly smaller risk less sensitivity small perturbations data soft shrinkage smaller bias overall L risk We also construct approximate pointwise confidence intervals WaveShrink address problem threshold selection
One key issues discrete continuous class prediction machine learning general seems problem estimating quality attributes Heuristic measures mostly assume independence attributes therefore successfully used domains strong dependencies attributes Relief extension ReliefF statistical methods capable correctly estimating quality attributes classification problems strong dependencies attributes Following analysis ReliefF extended continuous class problems Regressional ReliefF RReliefF ReliefF provide unified view estimation quality attributes The experiments show RReliefF successfully estimates quality attributes used nonmyopic learning regression trees
We investigate abduction induction integrated common learning framework notion Abductive Concept Learning ACL ACL extension Inductive Logic Programming ILP case background target theory abductive logic programs abductive notion entailment used coverage relation In framework possible learn incomplete information examples exploiting hypothetical reasoning abduction The paper presents basic framework ACL main characteristics illustrates potential addressing several problems ILP learning incomplete information multiple predicate learning An algorithm ACL developed suitably extending topdown ILP method concept learning integrating abductive proof procedure Abductive Logic Programming ALP A prototype system developed applied learning problems incomplete information The particular role integrity constraints ACL investigated showing ACL hybrid learning framework integrates explanatory discriminant descriptive characteristic settings ILP
In Markov decision process MDP formalization reinforcement learning single adaptive agent interacts environment defined probabilistic transition function In solipsistic view secondary agents part environment therefore fixed behavior The framework Markov games allows us widen view include multiple adaptive agents interacting competing goals This paper considers step direction exactly two agents diametrically opposed goals share environment It describes Qlearninglike algorithm finding optimal policies demonstrates application simple twoplayer game optimal policy probabilistic
Genetic Programming promising new method automatically generating functions algorithms natural selection In contrast learning methods Genetic Programmings automatic programming makes natural approach developing algorithmic robot behaviors In paper present overview apply Genetic Programming behaviorbased team coordination RoboCup Soccer Server domain The result handcoded soccer algorithm team softbots learned play reasonable game soccer
We evolved artificial neural networks control wandering behavior small robots The task touch many squares grid possible fixed period time A number simulated robots embodied small Lego Trademark robot controlled Motorola Trademark processor performance compared simulations We observed evolution effective means program control b progress characterized sharply stepped periods improvement separated periods stasis corresponded levels behavioralcomputational complexity c simulated realized robots behaved quite similarly realized robots cases outperforming simulated ones Introducing random noise simulations improved fit somewhat Hybrid simulatedembodied selection regimes evolutionary robots discussed
The predatorprey domain utilized conduct research Distributed Artificial Intelligence Genetic Programming used evolve behavioral strategies predator agents To utility predator strategies prey population allowed evolve time The expected competitive learning cycle surface This failing investigated simple prey algorithm surfaces consistently able evade capture predator algorithms
Genetic Algorithms Evolution Strategies main representatives class algorithms based model natural evolution discussed wrt basic working mechanisms differences application possibilities The mechanism selfadaptation strategy parameters within Evolution Strategies emphasized turns major difference Genetic Algorithms since allows online adaptation strategy parameters without exogenous control
This paper explores two boosting techniques costsensitive tree classification situation misclassification costs change often Ideally one would like one induction use induced model different misclassification costs Thus demands robustness induced model cost changes Combining multiple trees gives robust predictions change We demonstrate ordinary boosting combined minimum expected cost criterion select prediction class good solution situation We also introduce variant ordinary boosting procedure utilizes cost information training We show proposed technique performs better ordinary boosting terms misclassification cost However technique requires induce set new trees every time cost changes Our empirical investigation also reveals interesting behavior boosting decision trees costsensitive classification
Previous approaches multiagent reinforcement learning either limited heuristic nature The main reason agents animats environment continually changes learning animats keep changing Traditional reinforcement learning algorithms properly deal Their convergence theorems require repeatable trials strong typically Markovian assumptions environment In paper however use novel general sound method multiple reinforcement learning animats living single life limited computational resources unrestricted changing environment The method called incremental selfimprovement IS Schmidhuber IS properly takes account whatever animat learns point may affect learning conditions animats later point The learning algorithm ISbased animat embedded policy animat improve performance principle also improve way improves etc At certain times animats life IS uses reinforcementtime ratios estimate single training example namely entire life far previously learned things still useful selectively keeps gets rid start appearing harmful IS based efficient stackbased backtracking procedure guaranteed make animats learning history history longterm reinforcement accelerations Experiments demonstrate IS effectiveness In one experiment IS learns sequence complex function approximation problems In another multiagent system consisting three coevolving ISbased animats chasing learns interesting stochastic predator prey strategies
The breeder genetic algorithm BGA depends set control parameters genetic operators In paper shown strategy adaptation competing subpopulations makes BGA robust efficient Each subpopulation uses different strategy competes subpopulations Numerical results pre sented number test functions
We present computational approach acquisition problem schemes learning application analogical problem solving Our work background automatic program construction relies concept recursive program schemes In contrast usual approach cognitive modelling computational models designed fit specific data propose framework describe certain empirically established characteristics human problem solving learning uniform formally sound way
Genetic algorithms GAs play major role many artificiallife systems often little detailed understanding GA performs little theoretical basis characterize types fitness landscapes lead successful GA performance In paper propose strategy addressing issues Our strategy consists defining set features fitness landscapes particularly relevant GA experimentally studying various configurations features affect GAs performance along number dimensions In paper informally describe initial set proposed feature classes describe detail one class Royal Road functions present initial experimental results concerning role crossover building blocks landscapes constructed features class
We consider question How one act goal learn much possible Building theoretical results Fedorov MacKay apply techniques Optimal Experiment Design OED guide queryaction selection neural network learner We demonstrate techniques allow learner minimize generalization error exploring domain efficiently completely We conclude panacea OEDbased queryaction much offer especially domains high computational costs tolerated This report describes research done Center Biological Computational Learning Artificial Intelligence Laboratory Massachusetts Institute Technology Support Center provided part grant National Science Foundation contract ASC The author also funded ATR Human Information Processing Laboratories Siemens Corporate Research NSF grant CDA
CBET software tool interactive exploration case base CBET integrated environment provides range browsing display functions make possible knowledge extraction set cases CBET motivated application training firemen Here cases describe past forest fire fighting interventions CBET used detect dependencies data acquire practical planning competences visualize complex data clustering similar cases In CBET well rooted Machine Learning techniques selecting relevant features clustering cases forecasting unknown values adapted reused case base exploration
Recent studies planning comparing plan reuse plan generation shown tasks may degree computational complexity even deal similar problems The aim paper show kind results apply also diagnosis We propose theoretical complexity analysis coupled experimental tests intended evaluate adequacy adaptation strategies reuse solutions past diagnostic problems order build solution problem solved Results analysis show even diagnosis reuse falls complexity class diagnosis generation NPcomplete problems practical advantages obtained exploiting hybrid architecture combining casebased modelbased diagnostic problem solving unifying framework
The representation hidden variable models attractor neural networks studied Memories stored dynamical attractor continuous manifold fixed points illustrated linear nonlinear networks hidden neurons Pattern analysis synthesis forms pattern completion recall stored memory Analysis synthesis linear network performed bottomup topdown connections In nonlinear network analysis computation additionally requires rectification nonlinearity inner product inhibition hidden neurons One popular approach sensory processing based generative models assume sensory input patterns synthesized underlying hidden variables For example sounds speech synthesized sequence phonemes images face synthesized pose lighting variables Hidden variables useful constitute simpler representation variables visible sensory input Using generative model sensory processing requires method pattern analysis Given sensory input pattern analysis recovery hidden variables synthesized In words analysis synthesis inverses There number approaches pattern analysis In analysisbysynthesis synthetic model embedded inside negative feedback loop Another approach construct separate analysis model This paper explores third approach visiblehidden pairs embedded attractive fixed points attractors state space recurrent neural network The attractors regarded memories stored network analysis synthesis forms pattern completion recall memory The approach illustrated linear nonlinear network architectures In networks synthetic model linear principal
Technical Report No C Jonathan Oliver Shortened appeared AI Statistics Abstract In paper examine Decision Graphs generalization decision trees We present inference scheme construct decision graphs using Minimum Message Length Principle Empirical tests demonstrate scheme compares favourably decision tree inference schemes This work provides metric comparing relative merit decision tree decision graph formalisms particular domain
For agent living nondeterministic Markov environment NME theory fastest way acquiring information statistical properties The answer design optimal sequences experiments performing action sequences maximize expected information gain This notion implemented combining concepts information theory reinforcement learning Experiments show resulting method reinforcement driven information acquisition explore certain NMEs much faster conventional random exploration
We consider learnability membership queries presence incomplete information In incomplete boundary query model introduced Blum et al assumed membership queries instances near boundary target concept may receive dont know answer We show zeroone threshold functions efficiently learnable model The learning algorithm uses split graphs boundary region radius generalization split hypergraphs give splitfinding algorithm boundary region constant radius greater We use notion indistinguishability concepts appropriate model
Most techniques verification validation directed functional properties programs However properties programs also essential This paper describes model average computing time KADS knowledgebased system based structure An example taken existing knowledgebased system used demonstrate use costmodel designing system
Realistic complex planning situations require mixedinitiative planning framework human automated planners interact mutually construct desired plan Ideally joint cooperation potential achieving better plans either human machine create alone Human planners often take casebased approach planning relying past experience planning retrieving adapting past planning cases Planning analogical reasoning generative casebased planning combined ProdigyAnalogy provides suitable framework study mixedinitiative integration However human user engaged planning loop creates variety new research questions The challenges found creating mixedinitiative planning system fall three categories planning paradigms differ human machine planning visualization plan planning process complex necessary task human users range across spectrum experience respect planning domain underlying planning technology This paper presents approach three problems designing interface incorporate human process planning analogical reasoning ProdigyAnalogy The interface allows user follow generative casebased planning supports visualization plan planning rationale addresses variance experience user allowing user control presentation information This research sponsored part DARPARL Knowledge Based Planning Scheduling Initiative grant number F A short version document appeared Cox M T Veloso M M Supporting combined human machine planning An interface planning analogical reasoning In D B Leake E Plaza Eds CaseBased Reasoning Research Development Second International Conference CaseBased Reasoning pp Berlin SpringerVerlag
One major goals early concept learners find hypotheses perfectly consistent training data It believed goal would indirectly achieve high degree predictive accuracy set test data Later research partially disproved belief However issue consistency yet resolved completely We examine issue consistency new perspective To avoid overfitting training data considerable number current systems sacrificed goal learning hypotheses perfectly consistent training instances setting goal hypothesis simplicity Occams razor Instead using simplicity goal developed novel approach addresses consistency directly In words concept learner explicit goal selecting appropriate degree consistency training data We begin paper exploring concept learning less perfect consistency Next describe system adapt degree consistency response feedback predictive accuracy test data Finally present results initial experiments begin address question tightly hypotheses fit training data different problems
In contribution propose new solution problem blind separation sources one dimensional signals images case waveform sources unknown also number For purpose multilayer neural networks associated adaptive learning algorithms developed The primary source signals nonGaussian distribution ie subGaussian andor superGaussian Computer experiments presented demonstrate validity high performance proposed approach
A new learning algorithm derived performs online stochastic gradient ascent mutual information outputs inputs network In absence priori knowledge signal noise components input propagation information depends calibrating network nonlinearities detailed higherorder moments input density functions By incidentally minimising mutual information outputs well maximising individual entropies network factorises input independent components As example application achieved nearperfect separation ten digitally mixed speech signals Our simulations lead us believe network performs better blind separation HeraultJutten network reflecting fact derived rigorously mutual information objective
We present method maintaining mixtures prunings prediction decision tree extends nodebased prunings Bun WST HS larger class edgebased prunings The method includes efficient online weight allocation algorithm used prediction compression classification Although set edgebased prunings given tree much larger nodebased prunings algorithm similar space time complexity previous mixture algorithms trees Using general online framework Freund Schapire FS prove algorithm maintains correctly mixture weights edgebased prunings bounded loss function We also give similar algorithm logarithmic loss function corresponding weight allocation algorithm Finally describe experiments comparing nodebased edgebased mixture models estimating probability next word English text show ad vantages edgebased models
Markov chain Monte Carlo MCMC methods including Gibbs sampler MetropolisHastings algorithm commonly used Bayesian statistics sampling complicated highdimensional posterior distributions A continuing source uncertainty long sampler must run order converge approximately target stationary distribution Rosenthal b presents method compute rigorous theoretical upper bounds number iterations required achieve specified degree convergence total variation distance verifying drift minorization conditions We propose use auxiliary simulations estimate numerical values needed Rosenthals theorem Our simulation method makes possible compute quantitative convergence bounds models requisite analytical computations would prohibitively difficult impossible On hand although method appears perform well example problems provide guarantees offered analytical proof Acknowledgements We thank Brad Carlin assistance encouragement
Uncertainty may taken characterize inferences conclusions premises three Under treatments uncertainty inference never characterized uncertainty We explore signiflcance uncertainty premises conclusion argument involves uncertainty We argue uncertainty characterize conclusion inference natural interplay uncertainty premises uncertainty procedure argument We show possible principle incorporate uncertainty premises rendering uncertainty arguments deductively valid But argue reect human argument computationally costly gain simplicity obtained allowing uncertainty inference sometimes outweigh loss exibility entails keywords uncertainty inference logic argument decision premises
In field Operation Research Artificial Intelligence several stochastic search algorithms designed based theory global random search Zhigljavsky Basically techniques iteratively sample search space respect probability distribution updated according result previous samples predefined strategy Genetic Algorithms GAs Goldberg Greedy Randomized Adaptive Search Procedures GRASP Feo Resende two particular instances paradigm In paper present SAGE search algorithm based fundamental mechanisms techniques However addresses class problems difficult design transformation operators perform local search intrinsic constraints definition problem For problems procedural approach natural way construct solutions resulting state space represented tree DAG The aim paper describe underlying heuristics used SAGE address problems belonging class The performance SAGE analyzed problem grammar induction successful application problems recent Abbadingo DFA learning competition presented
Summary We analyze hierarchical Bayes model related usual empirical Bayes formulation JamesStein estimators We consider running Gibbs sampler model Using previous results convergence rates Markov chains provide rigorous numerical reasonable bounds running time Gibbs sampler suitable range prior distributions We apply results baseball data Efron Morris For different range prior distributions prove Gibbs sampler fail converge use information prove case associated posterior distribution nonnormalizable Acknowledgements I grateful Jun Liu suggesting project Neal Madras suggesting use Submartingale Convergence Theorem herein I thank Kate Cowles Richard Tweedie helpful conversations thank referees useful comments
Evolutionary Algorithms often presented general purpose search methods Yet also know search method better another possible problems fact often good deal problem specific information involved choice problem representation search operators In paper explore general properties representations relate neighborhood search methods In particular looked expected number local optima neighborhood search operator averaged overall possible representations The number local optima neighborhood search operator standard Binary standard binary reflected Gray codes developed explored one measure problem complexity We also relate number local optima another metric designed provide one measure complexity respect simple genetic algorithm Choosing good representation vital component solving search problem However choosing good representation problem difficult choosing good search algorithm problem Wolpert Macreadys No Free Lunch NFL theorem proves search algorithm better possible discrete functions Radcliffe Surry extend notions also cover idea representations equivalent behavior considered average possible functions To understand results first outline simple assumptions behind theorem First assume optimization problem discrete describes combinatorial optimization problemsand really optimization problems solved computers since computers finite precision Second ignore fact resample points space The No Free Lunch result stated follows
We investigate effectiveness connectionist networks predicting future continuation temporal sequences The problem overfitting particularly serious short records noisy data addressed method weightelimination term penalizing network complexity added usual cost function backpropagation The ultimate goal prediction accuracy We analyze two time series On benchmark sunspot series networks outperform traditional statistical approaches We show network performance deteriorate input units needed Weightelimination also manages extract part dynamics notoriously noisy currency exchange rates makes network solution interpretable
We present detailed analysis evolution genetic programming GP populations using problem finding program returns maximum possible value given terminal function set depth limit program tree known MAX problem We confirm basic message Gathercole Ross crossover together program size restrictions responsible premature convergence suboptimal solution We show happen even population retains high level variety show many cases evolution suboptimal solution solution possible sufficient time allowed In cases theoretical models presented compared actual runs
The main operations Inductive Logic Programming ILP generalization specialization make sense generality order In ILP three important generality orders subsumption implication implication relative background knowledge The two languages used often languages clauses languages Horn clauses This gives total six different ordered languages In paper give systematic treatment existence nonexistence least generalizations greatest specializations finite sets clauses six ordered sets We survey results already obtained others also contribute answers Our main new results firstly existence computable least generalization implication every finite set clauses containing least one nontautologous functionfree clause among necessarily functionfree clauses Secondly show least generalization need exist relative implication even set generalized background knowledge functionfree Thirdly give complete discussion existence nonexistence greatest specializations six ordered languages
This articles discusses developments Bayesian time series modelling analysis relevant studies time series physical engineering sciences With illustrations references discuss Bayesian inference computation various statespace models examples analysing quasiperiodic series isolation modelling various components error time series decompositions time series significant latent subseries nonlinear time series models based mixtures autoregressions problems errors uncertainties timing observations development nonlinear models based stochastic deformations time scales
Some areas recent development current interest time series noted discussion Bayesian modelling efforts motivated substantial practical problems The areas include nonlinear autoregressive time series modelling measurement error structures statespace modelling time series issues timing uncertainties time deformations Some discussion needs opportunities work nonsemiparametric models robustness issues given context
We present method unsupervised segmentation data streams originating different unknown sources alternate time We use architecture consisting competing neural networks Memory included order resolve ambiguities inputoutput relations In order obtain maximal specialization competition adiabatically increased training Our method achieves almost perfect identification segmentation case switching chaotic dynamics input manifolds overlap inputoutput relations ambiguous Only small dataset needed training proceedure Applications time series complex systems demonstrate potential relevance approach time series analysis shortterm prediction
We outline differential theory learning statistical pattern classification When applied neural networks theory leads efficient differential learning strategy based classification figureofmerit CFM objective functions Differential learning guarantees highest probability generalization classifier limited functional complexity trained limited number examples The theory significant two reasons We demonstrate importance differential learnings efficiency simple pattern recognition task lends closedform analysis We conclude practical application theory differentially trained perceptron diagnoses crippling joint disorder magnetic resonance images better probabilistically trained counterpart complex probabilistically trained multilayer perceptrons The recent renaissance connectionism led considerable amount research regarding generalization neural network pattern classifiers trained supervised fashion Most research done computational learning theorists statisticians intent matching functional complexity classifier size training sample order avoid wellknown curse dimensionality see example work Barron Baum Haussler Vapnik much summarized Yet relatively little attention paid effect objective function used drive supervised learning procedure discrimination generalization fl Copyright c fl J B Hampshire II B V K V Kumar rights reserved Copyright automatically extended IEEE submission accepted presentationpublication This research funded Air Force Office Scientific Research grant AFOSR supported supercomputing grant National Science Foundations Pittsburgh Supercomputing Center grant CCRP The views conclusions contained submission authors interpreted representing official policies either expressed implied US Air Force National Science Foundation US Government
Anaplastic thyroid carcinoma rare aggressive tumor Many factors might influence survival patients suggested The aim study determine factors known time admission hospital might predict survival patients anaplastic thyroid carcinoma Our aim also assess relative importance factors identify potentially useful decision regression trees generated machine learning algorithms Our study included patients females males mean age years anaplastic thyroid carcinoma treated Institute Oncology Ljubljana Patients classified categories according attributes sex age history physical findings extent disease admission tumor morphology In paper compare machine learning approach previous statistical evaluations problem univariate multivariate analysis show provide thorough analysis improve understanding data
We study behavior family learning algorithms based Suttons method temporal differences In online learning framework learning takes place sequence trials goal learning algorithm estimate discounted sum reinforcements received future In setting able prove general upper bounds performance slightly modified version Suttons socalled TD algorithm These bounds stated terms performance best linear predictor given training sequence proved without making statistical assumptions kind process producing learners observed training sequence We also prove lower bounds performance algorithm learning problem give similar analysis closely related problem learning predict model learner must produce predictions whole batch observations receiving reinforcement
The common use static binary placevalue codes realvalued parameters phenotype Hollands genetic algorithm GA forces either sacrifice representational precision efficiency search vice versa Dynamic Parameter Encoding DPE mechanism avoids dilemma using convergence statistics derived GA population adaptively control mapping fixedlength binary genes real values DPE shown empirically effective amenable analysis explore problem premature convergence GAs two convergence models
Traditionally genetic algorithms relied upon point crossover operators Many recent empirical studies however shown benefits higher numbers crossover points Some intriguing recent work focused uniform crossover involves average L crossover points strings length L Despite theoretical analysis however appears difficult predict particular crossover form optimal given problem This paper describes adaptive genetic algorithm decides runs form optimal
This paper reviews application Gibbs sampling cointegrated VAR system Aggregate imports import prices Belgium modelled using two cointegrating relations Gibbs sampling techniques used estimate Bayesian perspective cointegrating relations weights VAR system Extensive use spectral analysis made get insight convergence issues
We provide generic Monte Carlo method find alternative maximum expected utility decision analysis We define artificial distribution product space alternatives states show optimal alternative mode implied marginal distribution alternatives After drawing sample artificial distribution may use exploratory data analysis tools approximately identify optimal alternative We illustrate method important types influence diagrams Decision Analysis Influence Diagrams Markov chain Monte Carlo Simulation
This paper describes new samplingbased heuristic tree search named SAGE presents analysis performance problem grammar induction This last work inspired Abbadingo DFA learning competition took place Mars November SAGE ended one two winners competition The second winning algorithm first proposed Rodney Price implements new evidencedriven heuristic state merging Our version heuristic also described paper compared SAGE
Conversational casebased reasoning CCBR successfully used assist case retrieval tasks However behavioral limitations CCBR motivate search integrations reasoning approaches This paper briefly describes groups ongoing efforts towards enhancing inferencing behaviors conversational casebased reasoning development tool named NaCoDAE In particular focus integrating NaCoDAE machine learning modelbased reasoning generative planning modules This paper defines CCBR briefly summarizes integrations explains enhance overall system Our research focuses enhancing performance conversational casebased reasoning CCBR systems Aha Breslow CCBR form casebased reasoning users initiate problem solving conversations entering initial problem description natural language text This text assumed partial rather complete problem description The CCBR system assists eliciting refinements description suggesting solutions Its primary purpose provide focus attention user quickly provide solutions problem Figure summarizes CCBR problem solving cycle Cases CCBR library three components
The identification design implementation strategies cooperation central research issue field Distributed Artificial Intelligence DAI We propose novel approach construction cooperation strategies group problem solvers based Genetic Programming GP paradigm GPs class adaptive algorithms used evolve solution structures optimize given evaluation criterion Our approach based designing representation cooperation strategies manipulated GPs We present results experiments predatorprey domain extensively studied easytodescribe difficulttosolve cooperation problem domain The key aspect approach minimal reliance domain knowledge human intervention construction good cooperation strategies Promising comparison results prior systems lend credence viability ap proach
Recently new approach involves form simulated evolution proposed building autonomous robots However still clear approach may adequate face real life problems In paper show control systems perform nontrivial sequence behaviors obtained methodology carefully designing conditions evolutionary process operates In experiment described paper mobile robot trained locate recognize grasp target object The controller robot evolved simulation downloaded tested real robot
It open chromosomal dimension performs best Although higherdimensional encodings whether real imaginary preserve geographical gene linkages suspect high dimension would perform desirably We studying question dimension encoding best given instance It likely optimal dimension somehow dependent chromosome size input graph topology interactions flexibility crossover yet unknown The interaction considerations number cuts used crossover also open issue In relocating genes onto multidimensional chromosome simplest way via sequential assignment rowmajor order Section showed performance improves DFSrowmajor reembedding used two threedimensional encodings We suspect phenomenon consistent higherdimensional cases hope perform detailed investigations future Although DFS reordering proved helpful linear encodings multidimensional encodings believe DFSrowmajor reembedding good approach multidimensional cases since rowmajor embedding simplistic We considering alternative dimensional dimensional reembeddings hopefully provide improvement T N Bui B R Moon Hyperplane synthesis genetic algorithms In Fifth International Conference Genetic Algorithms pages July T N Bui B R Moon Analyzing hyperplane synthesis genetic algorithms using clustered schemata In International Conference Evolutionary Computation Oct Lecture Notes Computer Science SpringerVerlag
Increasing attention paid reinforcement learning algorithms recent years partly due successes theoretical analysis behavior Markov environments If Markov assumption removed however neither generally algorithms analyses continue usable We propose analyze new learning algorithm solve certain class nonMarkov decision problems Our algorithm applies problems environment Markov learner restricted access state information The algorithm involves MonteCarlo policy evaluation combined policy improvement method similar Markov decision problems guaranteed converge local maximum The algorithm operates space stochastic policies space yield policy performs considerably better deterministic policy Although space stochastic policies continuouseven discrete action spaceour algorithm computationally tractable
Markov chain Monte Carlo MCMC algorithms revolutionized Bayesian practice In simplest form ie parameters updated one time however often slow converge applied highdimensional statistical models A remedy problem block parameters groups updated simultaneously using either Gibbs MetropolisHastings step In paper construct several partially fully blocked MCMC algorithms minimizing autocorrelation MCMC samples arising important classes longitudinal data models We exploit identity used Chib context Bayes factor computation show parameters general linear mixed model may updated single block improving convergence producing essentially independent draws posterior parameters interest We also investigate value blocking nonGaussian mixed models well class binary response data longitudinal models We illustrate approaches detail three realdata examples
This paper presents comparison two feature selection methods Importance Score IS based greedylike search genetic algorithmbased GA method order better understand strengths limitations area application The results experiments show strong relation nature data behavior systems The Importance Score method efficient dealing little noise small number interacting features genetic algorithms provide robust solution expense increased computational effort Keywords feature selection machine learning genetic algorithms search
This paper describes training algorithm Simple Synchrony Networks SSNs reports experiments language learning using recursive grammar The SSN new connectionist architecture combining technique learning patterns across time Simple Recurrent Networks SRNs Temporal Synchrony Variable Binding TSVB The use TSVB means SSN learn entities training set generalise information entities test set In experiments network trained sentences one embedded clause words restricted certain classes constituent During testing network generalises information learned sentences three embedded clauses words appearing constituent These results demonstrate SSNs learn generalisations across syntactic constituents
We use directed search techniques space computer programs learn recursive sequences positive integers Specifically integer sequences squares x cubes x factorial x Fibonacci numbers studied Given small finite prefix sequence show three directed searchesmachinelanguage genetic programming crossover exhaustive iterative hill climbing hybrid crossover hill climbingcan automatically discover programs exactly reproduce finite target prefix moreover correctly produce remaining sequence underlying machines precision Our machinelanguage representation genericit contains instructions arithmetic register manipulation comparison control flow We also introduce output instruction allows variablelength sequences result values Importantly representation contain recursive operators recursion needed automatically synthesized primitive instructions For fixed set search parameters eg instruction set program size fitness criteria compare efficiencies three directed search techniques four sequence problems For parameter set evolutionarybased search always outperforms exhaustive hill climbing well undirected random search Since prefix target sequence variable experiments posit approach sequence induction potentially quite general
This paper studies problem establishes desired implication analytic systems several cases compact state space ii Poisson stability condition iii generic sense In addition paper studies accessibility properties control sets recently introduced context dynamical systems studies Finally various examples counterexamples provided relating various Lie algebras introduced past work
This paper demonstrates use graphs mathematical tool expressing independencies formal language communicating processing causal information decision analysis We show complex information external interventions organized represented graphically conversely graphical representation used facilitate quantitative predictions effects interventions We first review theory Bayesian networks show directed acyclic graphs DAGs offer economical scheme representing conditional independence assumptions deducing displaying logical consequences assumptions We introduce manipulative account causation show DAG defines simple transformation tells us probability distribution change result external interventions system Using transformation possible quantify nonexperimental data effects external interventions specify conditions randomized experiments necessary As example show effect smoking lung cancer quantified nonexperimental data using minimal set qualitative assumptions Finally paper offers graphical interpretation Rubins model causal effects demonstrates equivalence manipulative account causation We exemplify tradeoffs two approaches deriving nonparametric bounds treatment effects conditions imperfect compliance fl Portions paper presented th Session International Statistical Institute Florence Italy August September
An interesting classical result due Jackson allows polynomialtime learning function class DNF using membership queries Since practical learning situations access membership oracle unrealistic paper explores possibility quantum computation might allow learning algorithm DNF relies example queries A natural extension Fourierbased learning quantum domain presented The algorithm requires example oracle runs O n time result appears classically impossible The algorithm unique among quantum algorithms assume priori knowledge function operate superposition includes possible basis states
A number exact algorithms developed perform probabilistic inference Bayesian belief networks recent years These algorithms use graphtheoretic techniques analyze exploit network topology In paper examine problem efficient probabilistic inference belief network combinatorial optimization problem finding optimal factoring given algebraic expression set probability distributions We define combinatorial optimization problem optimal factoring problem discuss application problem belief networks We show optimal factoring provides insight key elements efficient probabilistic inference present simple easily implemented algorithms excellent performance We also show use algebraic perspective permits significant extension belief net representation
A global classification currently known protein sequences performed Every protein sequence partitioned segments amino acids dynamicprogramming distance calculated pair segments This space segments first embedded Euclidean space small metric distortion A novel selforganized crossvalidated clustering algorithm applied embedded space Euclidean distances The resulting hierarchical tree clusters offers new representation protein sequences families compares favorably updated classifications based functional structural protein data Motifs domains Zinc Finger EF hand Homeobox EGFlike others automatically correctly identified A novel representation protein families introduced functional biological kinship protein families deduced demonstrated transporters family
Explanation important issue building computerbased interactive design environments human designer knowledge system may cooperatively solve design problem We consider two related problems explaining systems reasoning design generated system In particular analyze content explanations design reasoning design solutions domain physical devices We describe two complementary languages taskmethodknowledge models explaining design reasoning structurebehaviorfunction models explaining device designs INTERACTIVE KRITIK computer program uses representations visually illustrate systems reasoning result design episode The explanation design reasoning INTERACTIVE KRITIK context evolving design solution similarly explanation design solution context design reasoning
In paper describe implementation backpropagation algorithm means object oriented library ARCH The use library relieve user details specific parallel programming paradigm time allows greater portability generated code To provide comparision existing solutions survey relevant implementations algorithm proposed far literature dedicated general purpose computers Extensive experimental results show use library hurt performance simulator contrary implementation Connection Machine CM comparable fastest category
Typical home comfort systems utilize rudimentary forms energy management conservation The sophisticated technology common use today automatic setback thermostat Tremendous potential remains improving efficiency electric gas usage However home residents ignorant physics energy utilization design environmental control strategies neither energy management experts ignorant behavior patterns inhabitants Adaptive control seems alternative We begun building adaptive control system infer appropriate rules operation home comfort systems based lifestyle inhabitants energy conservation goals Recent research demonstrated potential neural networks intelligent control We constructing prototype control system actual residence using neural network reinforcement learning prediction techniques The residence equipped sensors provide information environmental conditions eg temperatures ambient lighting level sound motion room actuators control gas furnace electric space heaters gas hot water heater lighting motorized blinds ceiling fans dampers heating ducts This paper presents overview project stands
Today great interest discovering methods allow faster design development realtime control software Control theory helps linear controllers developed support generation In paper discussed Machine Learning applied Function Locally Receptive Field Function Approximators Three integrated learning algorithms two original described tried two experimental test cases The first test case provided industrial robot KUKA IR engaged pegintohole task second classical prediction task MackeyGlass chaotic series From experimental comparison appears Fuzzy Controllers RBFNs synthesised examples excellent approximators even accurate MLPs nonlinear controllers many cases compliant motion control
The term Soft Computing SC represents combination emerging problemsolving technologies Fuzzy Logic FL Probabilistic Reasoning PR Neural Networks NNs Genetic Algorithms GAs Each technologies provide us complementary reasoning searching methods solve complex realworld problems After brief description technologies analyze useful combinations use FL control GAs NNs parameters application GAs evolve NNs topologies weights tune FL controllers implementation FL controllers NNs tuned backpropagationtype algorithms
Two issues intelligent navigation robot addressed work First robots ability learn representation local environment use representation identify local environment This done first extracting features sensors informative distances obstacles various directions Using features reduced ring representation RRR local environment derived As robot navigates learns RRR signatures new environment types encounters For purpose identification ring matching criteria proposed robot tries match RRR sensory input one RRRs library The second issue addressed learning hill climbing control laws local environments Unlike conventional neurocontrollers reinforcement learning framework robot first learns model environment learns control law terms neural network proposed The reinforcement function generated sensory inputs robot control action taken Three key results shown work The robot able build library RRR signatures perfectly even significant sensor noise eight different local environmets It able identify local environment accuracy library build robot able learn adequate hill climbing control laws take distinctive state local environment five different environment types
As artificial neural networks ANNs gain popularity variety application domains critical models run fast generate results real time Although number implementations neural networks available sequential machines implementations require inordinate amount time train run ANNs especially ANN models large One approach speeding implementation ANNs implement parallel machines This paper surveys area parallel environments implementations ANNs prescribes desired characteristics look implementations
We derive distributionfree uniform test error bounds improve VCtype bounds validation We show use knowledge test inputs improve bounds The bounds sharp require intense computation We introduce method trade sharpness speed computation Also compute bounds several test cases
Connectionist research firmly established within scientific community especially within multidisciplinary field cognitive science This diversity however created environment makes difficult connectionist researchers remain aware recent advances field let alone understand field developed This paper attempts address problem providing brief guide connectionist research The paper begins defining basic tenets connectionism Next development connectionist research traced commencing connectionisms philosophical predecessors moving early psychological neuropsychological influences followed mathematical computing contributions connectionist research Current research reviewed focusing specifically different types network architectures learning rules use The paper concludes suggesting neural network researchat least cognitive scienceshould move towards models incorporate relevant functional principles inherent neurobiological systems
Scheiblechner proposes probabilistic axiomatization measurement called ISOP isotonic ordinal probabilistic models replaces Raschs specific objectivity assumptions two interesting ordinal assumptions Special cases Scheiblechners model include standard unidimensional factor analysis models loadings held constant Rasch model binary item responses Closely related doublymonotone item response models Mokken see also Mokken Lewis Sijtsma Molenaar Sijtsma Junker Sijtsma Hemker More generally strictly unidimensional latent variable models considered detail Holland Rosenbaum Ellis van den Wollenberg Junker The purpose note provide connections current research foundations nonparametric latent variable item response modeling missing Scheiblechners paper point important related work Hemker et al ab Ellis Junker Junker Ellis We also discuss counterexamples three major theorems paper By carrying three tasks hope provide researchers interested foundations measurement item response modeling opportunity give ISOP approach careful attention deserves
The sensorimotor integration system viewed observer attempting estimate state state environment integrating multiple sources information We describe computational framework capturing notion specific models integration adaptation result Psychophysical results two sensorimotor systems subserving integration adaptation visuoauditory maps estimation state hand arm movements presented analyzed within framework These results suggest Spatial information visual auditory systems integrated reduce variance localization The effects remapping relation visual auditory space predicted simple learning rule The temporal propagation errors estimating hands state captured linear dynamic observer providing evidence existence internal model simulates dynamic behavior arm
Several researchers demonstrated complex action sequences learned neuroevolution ie evolving neural networks genetic algorithms However complex general behavior evading predators avoiding obstacles tied specific environments turns difficult evolve Often system discovers mechanical strategies moving back forth help agent cope effective appear believable would generalize new environments The problem general strategy difficult evolution system discover directly This paper proposes approach complex general behavior learned incrementally starting simpler behavior gradually making task challenging general The task transitions implemented successive stages deltacoding ie evolving modifications allows even converged populations adapt new task The method tested stochastic dynamic task prey capture compared direct evolution The incremental approach evolves effective general behavior also scale harder tasks
Go difficult game computers master best go programs still weaker average human player Since traditional game playing techniques proven inadequate new approaches computer go need studied This paper presents new approach learning play go The SANE Symbiotic Adaptive NeuroEvolution method used evolve networks capable playing go small boards preprogrammed go knowledge On fi go board networks able defeat simple computer opponent evolved within hundred generations Most significantly networks exhibited several aspects general go playing suggests approach could scale well
Recent studies floating building block representation genetic algorithm GA suggest many advantages using floating representation This paper investigates behavior GA floating representation problems response three different types pressures reduction amount genetic material available GA problem solving process functions negativevalued building blocks randomizing noncoding segments Results indicate GAs performance floating representation problems robust Significant reductions genetic material genome length may made relatively small decrease performance The GA effectively solve problems negative building blocks Randomizing noncoding segments appears improve rather harm GA performance
This work initiated Junker visiting University Utrecht support Carnegie Mellon University Faculty Development Grant generous hospitality Social Sciences Faculty University Utrecht Additional support provided Office Naval Research Cognitive Sciences Division Grant NK National Institute Mental Health Training Grant MH
We analyze simple hillclimbing algorithm RMHC previously shown outperform genetic algorithm GA simple Royal Road function We analyze idealized genetic algorithm IGA significantly faster RMHC gives lower bound GA speed We identify features IGA give rise speedup discuss features incorporated real GA
We consider recently proposed parallel variable distribution PVD algorithm Ferris Mangasarian solving optimization problems variables distributed among p processors Each processor primary responsibility updating block variables allowing remaining secondary variables change restricted fashion along easily computable directions We propose useful generalizations consist general unconstrained case replacing exact global solution subproblems certain natural sufficient descent condition convex case inexact subproblem solution PVD algorithm These modifications key features algorithm analyzed The proposed modified algorithms practical make easier achieve good load balancing among parallel processors We present general framework analysis class algorithms derive new improved linear convergence results problems weak sharp minima order strongly convex problems We also show nonmonotone synchronization schemes admissible improves flexibility PVD approach
A paradigm statistical mechanics financial markets SMFM fit multivariate financial markets using Adaptive Simulated Annealing ASA global optimization algorithm perform maximum likelihood fits Lagrangians defined path integrals multivariate conditional probabilities Canonical momenta thereby derived used technical indicators recursive ASA optimization process tune trading rules These trading rules used outofsample data demonstrate profit SMFM model illustrate markets likely efficient This methodology extended systems eg electroencephalography This approach complex systems emphasizes utility blending intuitive powerful mathematicalphysics formalism generate indicators used AItype rulebased models management
The computational power formal models networks spiking neurons compared neural network models based McCulloch Pitts neurons ie threshold gates respectively sigmoidal gates In particular shown networks spiking neurons computationally powerful neural network models A concrete biologically relevant function exhibited computed single spiking neuron biologically reasonable values parameters requires hundreds hidden units sigmoidal neural net This article assume prior knowledge spiking neurons contains extensive list references currently available literature computations networks spiking neurons relevant results neuro biology
We compare Genetic Algorithms GA functional search method Very Fast Simulated Reannealing VFSR efficient search strategy also statistically guaranteed find function optima GA previously demonstrated competitive standard Boltzmanntype simulated annealing techniques Presenting suite six standard test functions GA VFSR codes previous studies without additional fine tuning strongly suggests VFSR expected orders magnitude efficient GA
In recent years machine learning research started addressing problem known theory refinement The goal theory refinement learner modify incomplete incorrect rule base representing domain theory make consistent set input training examples This paper presents major revision Either propositional theory refinement system Two issues discussed First show run time efficiency greatly improved changing exhaustive scheme computing repairs iterative greedy method Second show extend Either refine MofN rules The resulting algorithm Neither New Either order magnitude faster produces significantly accurate results theories fit MofN format To demonstrate advantages Neither present experimental results two realworld domains
We consider problem belief aggregation given group individual agents probabilistic beliefs set uncertain events formulate sensible consensus aggregate probability distribution events Researchers proposed many aggregation methods although question best general consensus consensus We develop marketbased approach problem agents bet uncertain events buying selling securities contingent outcomes Each agent acts market maximize expected utility given securities prices limited activity risk aversion The equilibrium prices goods market represent aggregate beliefs For agents constant risk aversion demonstrate aggregate probability exhibits several desirable properties related independently motivated techniques We argue marketbased approach provides plausible mechanism belief aggregation multiagent systems directly addresses selfmotivated agent incentives participation truthfulness provide decisiontheoretic foundation expert weights often employed centralized pooling techniques
Predictability minimization PM Schmidhuber exhibits various intuitive theoretical advantages many methods unsupervised redundancy reduction So far however toy applications PM In paper apply semilinear PM static real world images find without teacher without significant preprocessing system automatically learns generate distributed representations based wellknown feature detectors orientation sensitive edge detectors offcenteronsurroundlike structures thus extracting simple features related considered useful image preprocessing compression
A learners modifiable components called policy An algorithm modifies policy learning algorithm If learning algorithm modifiable components represented part policy speak selfmodifying policy SMP SMPs modify way modify etc They interest situations initial learning algorithm improved experience call learning learn How force stochastic SMP trigger better better selfmodifications The successstory algorithm SSA addresses question lifelong reinforcement learning context During learners lifetime SSA occasionally called times computed according SMP SSA uses backtracking undo SMPgenerated SMPmodifications empirically observed trigger lifelong reward accelerations measured current SSA call evaluates longterm effects SMPmodifications setting stage later SMPmodifications SMPmodifications survive SSA represent lifelong success history Until next SSA call build basis additional SMPmodifications Solely selfmodifications SMPSSAbased learners solve complex task partially observable environment POE whose state space far bigger reported POE literature
We study task sequences allow speeding learners average reward intake appropriate shifts inductive bias changes learners policy To evaluate longterm effects bias shifts setting stage later bias shifts use successstory algorithm SSA SSA occasionally called times may depend policy It uses backtracking undo bias shifts empirically observed trigger longterm reward accelerations measured current SSA call Bias shifts survive SSA represent lifelong success history Until next SSA call considered useful build basis additional bias shifts SSA allows plugging wide variety learning algorithms We plug novel adaptive extension Levin search method embedding learners policy modification strategy within policy incremental selfimprovement Our inductive transfer case studies involve complex partially observable environments traditional reinforcement learning fails
The inductive logic programming system LOPSTER created demonstrate advantage basing induction logical implication rather subsumption LOPSTERs subunification procedures allow induce recursive relations using minimum number examples whereas inductive logic programming algorithms based subsumption require many examples solve induction tasks However LOPSTERs input examples must carefully chosen must along inverse resolution path We hypothesize extension LOPSTER efficiently induce recursive relations without requirement We introduce generalization LOPSTER named CRUSTACEAN capability empirically evaluate ability induce recursive relations
Submitted NIPS TD popular family algorithms approximate policy evaluation large MDPs TD works incrementally updating value function observed transition It two major drawbacks makes inefficient use data requires user manually tune stepsize schedule good performance For case linear value function approximations LeastSquares TD LSTD algorithm Bradtke Barto eliminates stepsize parameters improves data efficiency This paper extends Bradtke Bartos work three significant ways First presents simpler derivation LSTD algorithm Second generalizes arbitrary values extreme resulting algorithm shown practical formulation supervised linear regression Third presents novel intuitive interpretation LSTD modelbased reinforcement learning technique
Technical Report No Department Statistics University Toronto Abstract Simulated annealing moving tractable distribution distribution interest via sequence intermediate distributions traditionally used inexact method handling isolated modes Markov chain samplers Here shown one use Markov chain transitions annealing sequence define importance sampler The Markov chain aspect allows method perform acceptably even highdimensional problems finding good importance sampling distributions would otherwise difficult use importance weights ensures estimates found converge correct values number annealing runs increases This annealed importance sampling procedure resembles second half previouslystudied tempered transitions seen generalization recentlyproposed variant sequential importance sampling It also related thermodynamic integration methods estimating ratios normalizing constants Annealed importance sampling attractive isolated modes present estimates normalizing constants required may also generally useful since independent sampling allows one bypass problems assessing convergence autocorrelation Markov chain samplers
The Genetic Programming optimization method GP elaborated John Koza Koza variant Genetic Algorithms The search space problem domain consists computer programs represented parse trees crossover operator realized exchange subtrees Empirical analyses show large parts trees never used evaluated means parts trees irrelevant solution redundant This paper concerned identification redundancy occuring GP It starts mathematical description behavior GP conclusions drawn description among others explain size problem denotes phenomenon average size trees population grows time
A year ago new metaheuristic graph coloring problems introduced Costa Hertz Dubuis They shown computer experiments clear indication benefits approach Graph coloring many applications specially areas scheduling assignments timetabling The metaheuristic classified memetic algorithm since based population search periods local optimization interspersed phases new configurations created earlier welldeveloped configurations local minima previous iterative improvement process The new population created using crossover operators genetic algorithms In paper discuss methodology inspired Competitive Analysis may relevant problem designing better crossover operators RESUMO No ultimo ano uma nova metaheurstica para problema de coloracao em grafos foi apresentada por Costa Hertz e Dubuis Eles mostraram com experimentos computacionais algumas indicacoes claras dos benefcios desta nova tecnica Coloracao em grafos tem muitas aplicacoes especialmente na area de programacao de tarefas localizacao e horario A metaheurstica pode ser classificada como algoritmo memetico desde que seja baseada em uma busca de populacao cujos perodos de otimizacao local sao intercalados com fases onde novas configuracoes sao criadas partir de boas configuracoes ou mnimos locais de iteracoes anteriores A nova populacao e criada usando operacoes de crossover como em algoritmos geneticos Neste artigo apresentamos como uma metodologia baseada em Competitive Analysis pode ser relevante para construir operacoes de crossover
For wellknown concept decision trees used inductive inference study natural concept equivalence two decision trees equivalent represent hypothesis We present simple efficient algorithm establish whether two decision trees equivalent The complexity algorithm bounded product sizes decision trees The hypothesis represented decision tree essentially boolean function like proposition Although every boolean function represented way show disjunctions conjunctions decision trees efficiently represented decision trees simply shaped propositions may require exponential size representation de cision trees
We propose assess relevance theories synaptic modification models feature extraction human vision using masks derived synaptic weight patterns occlude parts stimulus images psychophysical experiments In experiment reported found mask derived principal component analysis object images effective reducing generalization performance human subjects mask derived another method feature extraction BCM based higherorder statistics images
A two dimensional timedependent Duffing oscillator model macroscopic neocortex exhibits chaos ranges parameters We embed model moderate noise typical context presented real neocortex using PATHINT nonMonteCarlo pathintegral algorithm particularly adept handling nonlinear FokkerPlanck systems This approach shows promise investigate whether chaos neocortex predicted models survive noisy contexts
The purpose architecture optimization schemes improve generalization In presentation suggest estimate weight saliency associated change generalization error weight pruned We detail implementation ON storage scheme extending OBD well ON scheme extending OBS We illustrate viability approach pre diction chaotic time series
In paper employ genetic programming paradigm enable computer learn play strategies ancient Egyptian boardgame Senet evolving board evaluation functions Formulating problem terms board evaluation functions made feasible evaluate fitness game playing strategies using tournamentstyle fitness evaluation The game elements strategy chance Our approach learns strategies enable computer play consistently reasonably skillful level
This paper introduces new algorithm Q optimizing expected output multiinput noisy continuous function Q designed need experiments avoids strong assumptions form function autonomous requires little problemspecific tweaking Four existing approaches problem response surface methods numerical optimization supervised learning evolutionary methods inadequacies requirement black box behavior combined need experiments Q uses instancebased determination convex region interest performing experiments In conventional instancebased approaches learning neighborhood defined proximity query point In contrast Q defines neighborhood new geometric procedure captures size shape zone possible optimum locations Q also optimizes weighted combinations outputs finds inputs produce target outputs We compare Q optimizers noisy functions several problems including simulated noisy process nonlinear continuous dynamics discreteevent queueing components Results encouraging terms speed autonomy
In paper propose multiclassification approach constructive induction The idea improvement classification accuracy based iterative modification input data space This process independently repeated pair n classes Finally gives n n input data subspaces attributes dedicated optimal discrimination appropriate pairs classes We use genetic algorithms constructive induction engine A final classification obtained weighted majority voting rule according n classifier approach The computational experiment performed medical data set The obtained results point advantage using multiclassification model n classifier constructive induction relation analogous singleclassifier approach
This highly interdisciplinary project extends previous work combat modeling controltheoretic descriptions decisionmaking human factors complex activities A previous paper established first theory statistical mechanics combat SMC developed using modern methods statistical mechanics baselined empirical data gleaned National Training Center NTC This previous project also established JANUSTNTC computer simulationwargame NTC providing statistical whatif capability NTC scenarios This mathematical formulation ripe controltheoretic extension include human factors methodology previously developed context teleoperated vehicles Similar NTC scenarios differing crucial decision points used data model inuence decision making combat The results may used improve present human factors C algorithms computer simulationswargames Our approach subordinate SMC nonlinear stochastic equations fitted NTC scenarios establish zeroth order description combat In practice equivalent mathematicalphysics representation used suitable numerical formal work ie Lagrangian representation Theoretically equations nested within larger set nonlinear stochastic operatorequations include C human factors eg supervisory decisions In study propose perturb operator theory SMC zeroth order set equations Then subsets scenarios fit zeroth order originally considered similarly degenerate split perturbatively distinguish C decisionmaking inuences New methods Very Fast Simulated ReAnnealing VFSR developed previous project used fitting models empirical data
The work progress reported Wright Liley shows great promise primarily experimental simulation paradigms However tentative conclusion macroscopic neocortex may considered approximately linear nearequilibrium system premature correspond tentative conclusions drawn studies neocortex At time exists interdisciplinary multidimensional gradation published studies neocortex one primary dimension mathematical physics represented two extremes At one extreme much scientifically unsupported talk chaos quantum physics responsible many important macroscopic neocortical processes involving many thousands millions neurons Wilczek At another extreme many nonmathematically trained neuroscientists uncritically lump neocortical mathematical theory one file consider statistical averages citations opinions quality research Nunez In context important appreciate Wright Liley WL report scientifically sound studies macroscopic neocortical function based simulation blend sound theory reproducible experiments However pioneering work given absence much knowledge neocortex time open criticism especially respect present inferences conclusions Their conclusion EEG data exhibit linear nearequilibrium dynamics may well true sense focusing one local minima possibly individualspecific physiologicalstate dependent
Traditional evolutionary optimization algorithms assume static evaluation function according solutions evolved Incremental evolution approach dynamic evaluation function scaled time order improve performance evolutionary optimization In paper present empirical results demonstrate effectiveness approach genetic programming Using two domains twoagent pursuitevasion game Tracker trailfollowing task demonstrate incremental evolution successful applied near beginning evolutionary run We also show incremental evolution successful intermediate evaluation functions difficult target evaluation function well easier target function
NKlandscapes offer ability assess performance evolutionary algorithms problems different degrees epistasis In paper study performance six algorithms NKlandscapes low high dimension keeping amount epistatic interactions constant The results show compared genetic local search algorithms performance standard genetic algorithms employing crossover mutation significantly decreases increasing problem size Furthermore increasing K crossover based algorithms cases outperformed mutation based algorithms However relative performance differences algorithms grow significantly dimension search space indicating important consider highdimensional landscapes evaluating performance evolutionary algorithms
Theories rational belief revision recently proposed Gardenfors Nebel illuminate many important issues impose unnecessarily strong standards correct revisions make strong assumptions information available guide revisions We reconstruct theories according economic standard rationality preferences used select among alternative possible revisions By permitting multiple partial specifications preferences ways closely related preferencebased nonmonotonic logics reconstructed theory employs information closer available practice offers flexible ways selecting revisions We formally compare notion rational belief revision Gardenfors Nebel adapt results universal default theories prove universal method rational belief revision examine formally different limitations rationality affect belief revision
Independent Component Analysis ICA statistical signal processing technique whose main applications blind source separation blind deconvolution feature extraction Estimation ICA usually performed optimizing contrast function based higherorder cumulants In paper shown almost error function used construct contrast function perform ICA estimation In particular means one use contrast functions robust outliers As practical method tnding relevant extrema contrast functions txedpoint iteration scheme introduced The resulting algorithms quite simple converge fast reliably These algorithms also enable estimation independent components onebyone using simple deation scheme
We present methodology representing probabilistic relationships generalequilibrium economic model Specifically define precise mapping Bayesian network binary nodes market price system consumers producers trade uncertain propositions We demonstrate correspondence equilibrium prices goods economy probabilities represented Bayesian network A computational market model may provide useful framework investigations belief aggregation distributed probabilistic inference resource allocation uncertainty problems de centralized uncertainty
This article appear Volume Journal Applied Statistical Science Adrian E Raftery Professor Statistics Sociology Department Statistics GN University Washington Seattle WA This research supported ONR contract NJ NIH Grant RHD Ministere de la Recherche et de lEspace Paris Universite de Paris VI INRIA Rocquencourt France Raftery thanks latter two institutions Paul Deheuvels Gilles Celeux hearty hospitality Paris sabbatical article written This article prepared presentation Conference Applied Change Point Analysis University MarylandBaltimore March Parts article review collaborative research others I would like express appreciation namely Volkan Akman Jeff Banfield Nhu Le Steven Lewis Doug Martin Fionn Murtagh Ross Taplin Simon Tavare
This paper describe functional architecture CHARADE software platform devoted development new generation intelligent environmental decision support systems The CHARADE platform based taskoriented approach system design exploitation new architecture problem solving integrates casebased reasoning constraint reasoning The platform developed objectoriented environment upon demonstrator developed managing first intervention attack forest fires
Dimensions complexity raised definition system aimed supporting planning initial attack forest fires presented discussed The complexity deriving highly dynamic unpredictable domain forest fire one realated individuation integration planning techniques suitable domain complexity addressing problem taking account role user supported system finally complexity architecture able integrate different subsystems In particular focus severe constraints definition planning approach posed fire fighting domain constraints satisfied completely current planning paradigms We propose approach based integratation skeletal planning case based reasoning techniques constraint reasoning More specifically temporal constraints used two steps planning process plan fitting adaptation resource scheduling Work development system software architecture OOD methodology progress
We examine efficient implementation back prop type algorithms T vector processor fixed point engine designed neural network simulation A matrix formulation back prop Matrix Back Prop shown efficient RISCs Using Matrix Back Prop achieve asymptotically optimal performance T GOPS forward backward phases possible standard online method Since high efficiency futile convergence poor due use fixed point arithmetic use mixture fixed floating point operations The key observation precision fixed point sufficient good convergence range appropriately chosen Though expensive computations implemented fixed point achieve rate convergence comparable floating point version The time taken conversion fixed floating point also shown reasonable
We describe preliminary version investigative software GGE Generative Genetic Explorer genetic operations interact AutoCAD generate novel D forms architect GGE allows us asess evolutionary algorithms tailored suit Architecture CAD tasks
In paper discuss approach learning classification rules data We sketch two modules architecture namely LINNEO GAR LINNEO knowledge acquisition tool illstructured domains automatically generating classes examples incrementally works unsupervised strategy LINNEO output representation conceptual structure domain terms classes input GAR used generate set classification rules original training set GAR generate conjunctive disjunctive rules Herein present application techniques data obtained real wastewater treatment plant order help construction rule base This rule used knowledgebased system aims supervise whole process
When reading sentence The diplomat threw ball ballpark princess interpretation changes dance event baseball back dance Such online disambiguation happens automatically appears based dynamically combining strengths association keywords two senses Subsymbolic neural networks good modeling behavior They learn word meanings soft constraints interpretation dynamically combine constraints form likely interpretation On hand difficult show systematic language structures relative clauses could processed system The network would learn associate specific contexts would able process new combinations A closer look understanding embedded clauses shows humans systematic processing grammatical structures either For example The girl boy girl lived next door blamed hit cried difficult understand whereas The car man dog rabies bit drives garage This difference emerges semantic constraints work disambiguation task In chapter show subsymbolic parser combined highlevel control allows system process novel combinations relative clauses systematically still sensitive semantic constraints
We investigated generalization capabilities backpropagation learning feedforward recurrent feedforward connectionist networks assignment syllable boundaries orthographic representations Dutch hyphenation This difficult task phonological morphological constraints interact leading ambiguity input patterns We compared results different symbolic pattern matching approaches exemplarbased generalization scheme related knearest neighbour approach using similarity metric weighed relative information entropy positions training patterns Our results indicate generalization performance backpropagation learning task better best symbolic pattern matching approaches exemplarbased generalization
We present framework incorporating pruning strategies MTiling constructive neural network learning algorithm Pruning involves elimination redundant elements connection weights neurons network considerable practical interest We describe three elementary sensitivity based strategies pruning neurons Experimental results demonstrate moderate significant reduction network size without compromising networks generalization performance
A number neural learning rules recently proposed Independent Component Analysis ICA The rules usually derived informationtheoretic criteria maximum entropy minimum mutual information In paper show fact ICA performed simple Hebbian antiHebbian learning rules may weak relations informationtheoretical quantities Rather suprisingly practically nonlinear function used learning rule provided sign HebbianantiHebbian term chosen correctly In addition Hebbianlike mechanism weight vector constrained unit norm data preprocessed prewhitening sphering These results imply one choose nonlinearity optimize desired statistical numerical criteria
There currently several types constructive growth algorithms available training feedforward neural network This paper describes explains main ones using fundamental approach multilayer perceptron problemsolving mechanisms The claimed convergence properties algorithms verified using two mapping theorems consequently enables algorithms unified basic mechanism The algorithms compared contrasted deficiencies highlighted The fundamental reasons actual success algorithms extracted used suggest might fruitfully applied A suspicion panacea current neural network difficulties one must somewhere along line pay learning efficiency promise developed argument generalization abilities lie average backpropagation
Prioritized sweeping modelbased reinforcement learning method attempts focus agents limited computational resources achieve good estimate value environment states To choose effectively spend costly planning step classic prioritized sweeping uses simple heuristic focus computation states likely largest errors In paper introduce generalized prioritized sweeping principled method generating estimates representationspecific manner This allows us extend prioritized sweeping beyond explicit statebased representation deal compact representations necessary dealing large state spaces We apply method generalized model approximators Bayesian networks describe preliminary experiments compare approach classical prioritized sweeping
Constructive learning algorithms offer approach incremental construction potentially nearminimal neural network architectures pattern classification tasks Such algorithms help overcome need adhoc often inappropriate choice network topology use algorithms search suitable weight setting otherwise apriori fixed network architecture Several algorithms proposed literature shown converge zero classification errors certain assumptions finite noncontradictory training set category classification problem This paper presents MTiling multicategory extension Tiling algorithm Mezard Nadal We establish convergence MTiling zero classification error multicategory pattern classification task Results experiments non linearly separable multicategory data sets demonstrate feasibility approach multicategory pattern classification also suggest several interesting directions future research
As real logic programmers normally use cut effective learning procedure logic programs able deal Because cut predicate procedural meaning clauses containing cut learned using extensional evaluation method done learning systems On hand searching space possible programs instead space independent clauses unfeasible An alternative solution generate first candidate base program covers positive examples make consistent inserting cut appropriate The problem learning programs cut investigated seems natural reasonable approach We generalize scheme investigate difficulties arise Some major shortcomings actually caused general need intensional evaluation As conclusion analysis paper suggests precise technical grounds learning cut difficult current induction techniques probably restricted purely declarative logic languages
We define Gamma multilayer perceptron MLP MLP usual synaptic weights replaced gamma filters proposed de Vries Principe de Vries Principe associated gain terms throughout layers We derive gradient descent update equations apply model recognition speech phonemes We find inclusion gamma filters layers inclusion synaptic gains improves performance Gamma MLP We compare Gamma MLP TDNN BackTsoi FIR MLP BackTsoi IIR MLP architectures local approximation scheme We find Gamma MLP results substantial reduction error rates
Neural oneunit learning rules problem Independent Component Analysis ICA blind source separation introduced In new algorithms every ICA neuron develops separator tnds one independent components The learning rules use simple constrained HebbianantiHebbian learning decorrelating feedback may added To speed convergence stochastic gradient descent rules novel com putationally ecient txedpoint algorithm introduced
Connectionist Models collection forty papers representing wide variety research topics connectionism The book distinguished single feature papers almost exclusively contributions graduate students active field The students selected rigorous review process participated two week long summer school devoted connectionism As ambitious editors state foreword These bold claims true reader presented exciting opportunity sample frontiers connectionism Their words imply two ways approach book The book must read random collection scientific papers also challenge evaluate controversial field This summer school actually third series previous ones held The proceedings summer school I priviledge participating reviewed Nigel Goddard Continuing pattern fourth school scheduled held Boulder CO
A knowledgebased system uses database aka theory produce answers queries receives Unfortunately answers may incorrect underlying theory faulty Standard theory revision systems use given set labeled queries query paired correct answer transform given theory adding andor deleting either rules andor antecedents related theory accurate possible After formally defining theory revision task paper provides sample computational complexity bounds process It first specifies number labeled queries necessary identify revised theory whose error close minimal high probability It considers computational complexity finding best theory proves unless P N P polynomial time algorithm identify nearoptimal revision even given exact distribution queries except trivial situations It also shows except trivial situations polynomialtime algorithm produce theory whose error even close ie within particular polynomial factor optimal These results suggest reasons theory revision effective learning scratch also justify many aspects standard theory revision systems including practice hillclimbing locallyoptimal theory based given set labeled queries fl This paper extends short article appeared Proceedings Fourteenth International Joint Conference Artificial Intelligence IJCAI Montreal August I gratefully acknowledge receiving helpful comments Edoardo Amaldi Mukesh Dalal George Drastal Adam Grove Tom Hancock Sheila McIlraith Roni Khardon Dan Roth especially thorough comments anonymous referees
This paper investigates dynamic pathbased method constructing conjunctions new attributes decision tree learning It searches conditions attributevalue pairs paths form new attributes Compared hypothesisdriven new attribute construction methods new idea method carries systematic search pruning path tree select conditions generating conjunction Therefore conditions constructing new attributes dynamically decided search Empirically evaluation set artificial realworld domains shows dynamic pathbased method improve performance selective decision tree learning terms higher prediction accuracy lower theory complexity In addition shows performance advantages fixed pathbased method fixed rulebased method learning decision trees
Numerous recent papers focus standard recurrent nets problems long time lags relevant signals Some propose rather sophisticated alternative methods We show many problems used test previous methods solved quickly random weight guessing
This paper contributes study nonlinear dynamical systems computational perspective These systems inherently powerful linear counterparts Markov chains wide impact Computer Science seem likely play increasing role future However yet general techniques available handling computational aspects discrete nonlinear systems even simplest examples seem hard analyze We focus paper class quadratic systems widely used model population genetics also genetic algorithms These systems describe process random matings occur parental chromosomes via mechanism known crossover ie children inherit pieces genetic material different parents according random rule Our results concern two fundamental quantitative properties crossover systems We develop general technique computing
Evolution stochastic process operates DNA species The evolutionary process leaves telltale signs DNA used construct phylogenies evolutionary trees set species Maximum Likelihood Estimations MLE methods seek evolutionary tree likely produced DNA consideration While methods widely accepted intellectually satisfying computationally intractable In paper address intractability MLE methods follows We introduce metric evolutionary stochastic process show metric meaningful giving lowerbound learnability true phylogeny terms metric measure We complement result simple efficient algorithm inverting stochastic process evolution building tree observations DNA species Put another way show PAClearn phylogenies Though many heuristics suggested problem algorithm first algorithm guaranteed convergence rate rate within polynomial lowerbound rate establish Our algorithm also first polynomial time algorithm guaranteed converge correct tree
Much done develop learning techniques delayed reward problems worlds actions andor states approximated discrete representations Although acceptable applications many situations approximation difficult unnatural For instance applications robotics real machines interact real world learning techniques use real valued continuous quantities required Presented paper extension Qlearning uses real valued states actions This achieved introducing activation strengths actuator system robot This allow actuators active continuous amount simultaneously Learning occurs incrementally adapting expected future reward goal evaluation function gradients function respect actuator system
Connections stochastic smoothingfiltering estimation incomplete data investigated It shown right censoring scheme KaplanMeier estimator characterized moment estimate based stochastic filtersmoothera pseudofiltersmoother Motivated result potentially useful martingale approach estimation convergence incomplete data proposed estimators characterized pseudostochastic smoothers sometimes reduce filters described system stochastic integral equations recent results convergence stochastic integrals stochastic differential equations applied address convergence issues As illustration double censoring problem revisited framework closed form estimator proposed convergence properties studied Martingale theory plays vital role entire analysis This approach essence selfconsistency method
Over past years telecommunications paradigm shifting rapidly hardware middleware In particular traditional issues service characteristics network control replaced modern customerdriven issues network service management eg electronic commerce onestop shops An area service management extremely high visibility negative impact managed badly problem handling Problem handling knowledge intensive activity particularly nowadays increase number complexity services becoming available Trials several BT support centres already demonstrated potential casebased reasoning technology improving current practice problem detection diagnosis A major cost involved implementing casebased system manual building initial case base subsequent maintenance case base time This paper shows inductive machine learning combined casebased reasoning produce intelligent system capable extracting knowledge raw data automatically reasoning knowledge In addition discovering knowledge existing data repositories integrated system may used acquire revise knowledge continually Experiments suggested integrated approach demonstrate promise justify next step
When using Genetic Programming GP Algorithm difficult problem large set training cases large population size needed large number functiontree evaluations must carried This paper describes reduce number evaluations selecting small subset training data set actually carry GP algorithm Three subset selection methods described paper Dynamic Subset Selection DSS using current GP run select difficult andor disused cases Historical Subset Selection HSS using previous GP runs Random Subset Selection RSS GP GPDSS GPHSS GPRSS compared large classification problem GPDSS produce better results less time taken GP GPHSS nearly match results GP perhaps surprisingly GPRSS occasionally approach results GP GP GPDSS compared smaller problem hybrid Dynamic Fitness Function DFF based DSS proposed
This paper presents Limited Error Fitness LEF modification standard supervised learning approach Genetic Programming GP individuals fitness score based many cases remain uncovered ordered training set individual exceeds error limit The training set order error limit altered dynamically response performance fittest individual previous generation
We describe experimental study pruning methods decision tree classifiers goal minimizing loss rather error In addition two common methods error minimization CARTs costcomplexity pruning Cs errorbased pruning study extension costcomplexity pruning loss one pruning variant based Laplace correction We perform empirical comparison methods evaluate respect loss We found applying Laplace correction estimate probability distributions leaves beneficial pruning methods Unlike error minimization somewhat surprisingly performing pruning led results par methods terms evaluation criteria The main advantage pruning reduction decision tree size sometimes factor ten While method dominated others datasets even domain different pruning mechanisms better different loss matrices
The Fourier transform boolean functions come play important role proving many important learnability results We aim demonstrate Fourier transform techniques also useful practical algorithm addition powerful theoretical tool We describe prominent changes introduced algorithm ones crucial without performance algorithm would severely deteriorate One benefits present confidence level prediction measures likelihood prediction correct
This paper looks use small populations Genetic Programming GP trend literature appears towards using large population possible requires memory resources CPUusage less efficient Dynamic Subset Selection DSS Limited Error Fitness LEF two different adaptive variations standard supervised learning method used GP This paper compares performance GP GPDSS GPLEF case classification problem using small population size A similar comparison GP GPDSS done larger messier case classification problem For problems GPDSS small population size consistently produces better answer using fewer tree evaluations runs using much larger populations Even standard GP seen perform well much smaller population size indicating certainly worth exploratory run three small population size assuming large population size necessary It interesting notion smaller mean faster better
One method detecting fraud check suspicious changes user behavior This paper describes automatic design user profiling methods purpose fraud detection using series data mining techniques Specifically use rulelearning program uncover indicators fraudulent behavior large database customer transactions Then indicators used create set monitors profile legitimate customer behavior indicate anomalies Finally outputs monitors used features system learns combine evidence generate highconfidence alarms The system applied problem detecting cellular cloning fraud based database call records Experiments indicate automatic approach performs better handcrafted methods detecting fraud Furthermore approach adapt changing conditions typical fraud detection environments
Given set samples probability distribution set discrete random variables study problem constructing good approximative neural network model underlying probability distribution Our approach based unsupervised learning scheme samples first divided separate clusters cluster coded single vector These Bayesian prototype vectors consist conditional probabilities representing attributevalue distribution inside corresponding cluster Using prototype vectors possible model underlying joint probability distribution simple Bayesian network tree realized feedforward neural network capable probabilistic reasoning In framework learning means choosing size prototype set partitioning samples corresponding clusters constructing cluster prototypes We describe prototypes determined given partition samples present method evaluating likelihood corresponding Bayesian tree We also present greedy heuristic searching space different partition schemes different numbers clusters aiming optimal approximation probability distribution
This paper introduces two new crossover operators Genetic Programming GP Contrary regular GP crossover operators presented attempt preserve context subtrees appeared parent trees A simple coordinate scheme nodes Sexpression tree proposed crossovers allowed nodes exactly partially matching coordinates
Hierarchical genetic programming HGP approaches rely discovery modification use new functions accelerate evolution This paper provides qualitative explanation improved behavior HGP based analysis evolution process dual perspective diversity causality From static point view use HGP approach enables manipulation population higher diversity programs Higher diversity increases exploratory ability genetic search process demonstrated theoretical experimental fitness distributions expanded structural complexity individuals From dynamic point view report analyzes causality crossover operator Causality relates changes structure object effect changes ie changes properties behavior object The analyses crossover causality suggests HGP discovers exploits useful structures bottomup hierarchical manner Diversity causality complementary affecting exploration exploitation genetic search Unlike machine learning techniques need extra machinery control tradeoff HGP automatically trades exploration exploitation
Reinforcement learning RL algorithms provide sound theoretical basis building learning control architectures embedded agents Unfortunately theory much practice see Barto et al exception RL limited Markovian decision processes MDPs Many realworld decision tasks however inherently nonMarkovian ie state environment incompletely known learning agent In paper consider partially observable MDPs POMDPs useful class nonMarkovian decision processes Most previous approaches problems combined computationally expensive stateestimation techniques learning control This paper investigates learning POMDPs without resorting form state estimation We present results TD Qlearning applied POMDPs It shown conventional discounted RL framework inadequate deal POMDPs Finally develop new framework learning without stateestimation POMDPs including stochastic policies search space defining value utility dis tribution states
The task monitor walking patterns give early warning falls using foot switch mercury trigger sensors We describe dynamic belief network model fall diagnosis given evidence sensor observations outputs beliefs current walking status makes predictions regarding future falls The model represents possible sensor error parametrised allow customisation individual monitored
I would like thank dissertation advisor Roger Schank valuable guidance research thank Cognitive Science reviewers helpful comments draft paper The research described conducted primarily Yale University supported part Defense Advanced Research Projects Agency monitored Office Naval Research contract NK Air Force Office Scientific Research contract FC
This paper describes two methods hierarchically organizing temporal behaviors The first intuitive grouping together common sequences events single units may treated individual behaviors This system immediately encounters problems however units binary meaning behaviors must execute completely hinders construction good training algorithms The system also runs difficulty one unit active time The second system hierarchy transition values This hierarchy dynamically modifies values specify degree one unit follow another These values continuous allowing use gradient descent learning Furthermore many units active time part systems normal functionings
This paper introduces incremental selfimprovement paradigm Unlike previous methods incremental selfimprovement encourages reinforcement learning system improve way learns improve way improves way learns without significant theoretical limitations system able shift inductive bias universal way Its major features There explicit difference learning metalearning kinds information processing Using Turing machine equivalent programming language system occasionally executes selfdelimiting initially highly random selfmodification programs modify contextdependent probabilities future action sequences including future selfmodification programs The system keeps probability modifications computed useful selfmodification programs bring payoff reward reinforcement per time previous selfmodification programs The computation payoff per time takes account computation time required learning entire system life considered boundaries learning trials ignored A particular implementation based novel paradigm presented It designed exploit conventional digital machines good fast storage addressing arithmetic operations etc Experiments illustrate systems mode operation Keywords Selfimprovement selfreference introspection machinelearning reinforcement learning Note This revised extended version earlier report November
Artificial neural networks ANN due inherent parallelism potential fault tolerance offer attractive paradigm robust efficient implementations large modern database knowledge base systems This paper explores neural network model efficient implementation database query system The application proposed model highspeed library query system retrieval multiple items based partial match specified query criteria stored records The performance ANN realization database query module analyzed compared techniques commonly current computer systems The results analysis suggest proposed ANN design offers attractive approach realization query modules large database knowledge base systems especially retrieval based partial matches
In Angluin showed class exhibiting combinatorial property called approximate fingerprints identified exactly using polynomially many Equivalence queries polynomial size Here show necessary condition every class without approximate fingerprints identification strategy makes polynomial number Equivalence queries Furthermore class honest technical sense computational power required strategy within polynomialtime hierarchy proving non learnability least hard showing P NP
Code scheduling exploit instruction level parallelism ILP critical problem compiler optimization research light increased use longinstructionword machines Unfortunately optimum scheduling computationally intractable one must resort carefully crafted heuristics practice If scope application scheduling heuristic limited basic blocks considerable performance loss may incurred block boundaries To overcome obstacle basic blocks coalesced across branches form larger regions super blocks In literature regions typically scheduled using algorithms either oblivious profile information assumption process forming region fully utilized profile information use profile information addendum classical scheduling techniques We believe even simple case linear code regions super blocks additional performance improvement gained utilizing profile information scheduling well We propose general paradigm converting profileinsensitive list scheduler profilesensitive scheduler Our technique developed via theoretical analysis simplified abstract model general problem profiledriven scheduling acyclic code region yielding scoring measure ranking branch instructions The ranking digests profile information useful property scheduling respect rank provably good minimizing expected completion time region within limits abstraction While ranking scheme computationally intractable general case practicable super blocks suggests heuristic present paper profiledriven scheduling super blocks Experiments show heuristic offers substantial performance improvement prior methods range integer benchmarks several machine models
We propose extension Genetic Programming paradigm allows users traditional Genetic Algorithms evolve computer programs To end introduce mechanisms like transscription editing repairing Genetic Programming We demonstrate feasibility approach using develop programs prediction sequences integer numbers
Generalized delta rule popularly known backpropagation BP probably one widely used procedures training multilayer feedforward networks sigmoid units Despite reports success number interesting problems BP excruciatingly slow converging set weights meet desired error criterion Several modifications improving learning speed proposed literature BP known suffer phenomenon flat spots The slowness BP direct consequence flatspots together formulation BP Learning rule This paper proposes new approach minimizing error suggested mathematical properties conventional error function effectively handles flatspots occurring output layer The robustness proposed technique demonstrated number datasets widely studied machine learning community
An efficient retrieval relatively small number relevant cases huge case base crucial subtask CaseBased Reasoning In article present Case Retrieval Nets CRNs memory model recently developed task The main idea apply spreading activation process netlike case memory order retrieve cases similar posed query case We summarize basic ideas CRNs suggest useful extensions present initial experimental results suggest CRNs successfully handle case bases larger considered usually CBR community
This paper presents Objectdirected Case Retrieval Nets memory model developed application CaseBased Reasoning task technical diagnosis The key idea store cases ie observed symptoms diagnoses network enhance network object model encoding knowledge devices application domain
Alan E Gelfand Professor Department Statistics University Connecticut Storrs CT Sujit K Sahu Lecturer School Mathematics University Wales Cardiff CF YH UK The research first author supported part NSF grant DMS second author supported part EPSRC grant UK The authors thank Brad Carlin Kate Cowles Gareth Roberts anonymous referee valuable comments
Technical Report No Department Statistics University Toronto Abstract Gaussian processes natural way defining prior distributions functions one input variables In simple nonparametric regression problem function gives mean Gaussian distribution observed response Gaussian process model easily implemented using matrix computations feasible datasets thousand cases Hyperparameters define covariance function Gaussian process sampled using Markov chain methods Regression models noise distribution logistic probit models classification applications implemented sampling well latent values underlying observations Software available implements methods using covariance functions hierarchical parameterizations Models defined way discover highlevel properties data inputs relevant predicting response
Rule induction systems seek generate rule sets optimal complexity rule set This paper develops formal proof NPCompleteness problem generating simplest rule set MIN RS accurately predicts examples training set particular type generalization algorithm algorithm complexity measure The proof informally extended cover broader spectrum complexity measures learning algorithms
Many factory optimization problems inventory control scheduling reliability formulated continuoustime Markov decision processes A primary goal problems find gainoptimal policy minimizes longrun average cost This paper describes new averagereward algorithm called SMART finding gainoptimal policies continuous time semiMarkov decision processes The paper presents detailed experimental study SMART large unreliable production inventory problem SMART outperforms two wellknown reliability heuristics industrial engineering A key feature study integration reinforcement learning algorithm directly two commercial discreteevent simulation packages ARENA CSIM paving way approach applied many factory optimization problems already exist simulation models
Locally weighted polynomial regression LWPR popular instancebased algorithm learning continuous nonlinear mappings For two three inputs thousand datapoints computational expense predictions daunting We discuss drawbacks previous approaches dealing problem present new algorithm based multiresolution search quicklyconstructible augmented kdtree Without needing rebuild tree make fast predictions arbitrary local weighting functions arbitrary kernel widths arbitrary queries The paper begins new faster algorithm exact LWPR predictions Next introduce approximation achieves twoordersofmagnitude speedup negligible accuracy losses Increasing certain approximation parameter achieves greater speedups still correspondingly larger accuracy degradation This nevertheless useful operations early stages model selection locating optima fitted surface We also show approximations permit realtime queryspecific optimization kernel width We conclude brief discussion potential extensions tractable instancebased learning datasets large fit com puters main memory
Ordinal assertions evolutionary context form species similar species x species deduced distance matrix M interspecies dissimilarities M x lt M Given species x ordinal binary character c xy M defined c xy M x lt Ms species In paper present several results concerning inference evolutionary trees phylogenies ordinal assertions In particular present A sixpoint condition characterizes distance matrices whose ordinal binary characters pairwise compatible This characterization analogous fourpoint condition additive matrices An optimal On algorithm n number species recovering phylogeny realizes ordinal binary characters distance matrix satisfies sixpoint condition An NPcompleteness result determining phylogeny realizes k ordinal binary characters given distance matrix
An XofN set containing one attributevalue pairs For given instance value corresponds number attributevalue pairs true In paper explore characteristics performance continuousvalued XofN attributes versus nominal XofN attributes constructive induction Nominal XofNs representationally powerful continuousvalued XofNs former suffer fragmentation problem although mechanisms subsetting help solve problem Two approaches constructive induction using continuousvalued XofNs described Continuousvalued XofNs perform better nominal ones domains need XofNs one cut point On domains need XofN representations one cut point nominal XofNs perform better continuousvalued ones Experimental results set artificial realworld domains support statements
This paper studies effects decision tree learning constructing four types attribute conjunctive disjunctive MofN XofN representations To reduce effects factors tree learning methods new attribute search strategies search starting points evaluation functions stopping criteria single tree learning algorithm developed With different option settings construct four different types new attribute factors fixed The study reveals conjunctive disjunctive representations similar performance terms prediction accuracy theory complexity variety concepts even DNF CNF concepts usually thought suited one two kinds representation In addition study demonstrates stronger representation power MofN conjunction disjunction stronger representation power XofN three types new attribute reflected performance decision tree learning terms higher prediction accuracy lower theory complexity
If analogy casebased reasoning systems scale large case bases important analyze various methods used retrieving analogues identify features problem appropriate This paper reports one analysis comparison retrieval marker passing spreading activation semantic network KnowledgeDirected Spreading Activation method developed wellsuited retrieving semantically distant analogues large knowledge base The analysis two complementary components theoretical model retrieval time based number problem characteristics experiments showing retrieval time approaches varies knowledge base size These two components taken together suggest KDSA likely SA able scale retrieval large knowledge bases
quence identification problems forms models depend absolute locations nucleotides assume independence consecutive nucleotide locations This paper describes new class learning methods called compressionbased induction CBI geared towards sequence learning problems arise learning DNA sequences The central idea use text compression techniques DNA sequences means generalizing sample sequences The resulting methods form models based important relative locations nucleotides dependence consecutive locations They also provide suitable framework biological domain knowledge injected learning process We present initial explorations range CBI methods demonstrate potential methods DNA sequence identification tasks
Our ability remember events situations daily life demonstrates ability rapidly acquire new memories There broad consensus hippocampal system HS plays critical role formation retrieval memories A computational model described demonstrates HS may rapidly transform transient pattern activity representing event situation persistent structural encoding via longterm potentiation longterm depression
We compare two techniques lighting control actual room equipped seven banks lights photoresistors detect lighting level four sensing points Each bank lights independently set one sixteen intensity levels The task determine device intensity levels achieve particular configuration sensor readings One technique explored uses neural network approximate mapping sensor readings device intensity levels The technique examined uses conventional feedback control loop The neural network approach appears superior require experimentation fly hence fluctuating light intensity levels settling lengthy settling times deal complex interactions conventional control techniques handle well This comparison performed part Adaptive House project described briefly Further directions control
We provide sufficient condition convergence general class alternating estimationmaximization EM type continuousparameter estimation algorithms respect given norm This class includes EM penalized EM Greens OSLEM approximate EM algorithms The convergence analysis extended include alternating coordinatemaximization EM algorithms Meng Rubins ECM Fessler Heros SAGE The condition monotone convergence used establish norms distance successive iterates limit point EMtype algorithm approaches zero monotonically For illustration apply results estimation Poisson rate parameters emission tomography establish final iterations logarithm EM iterates converge monotonically weighted Euclidean norm
The Kbann approach uses neural networks refine knowledge written form simple propositional rules We extend idea presenting Manncon algorithm mathematical equations governing PID controller determine topology initial weights network trained using backpropagation We apply method task controlling outflow temperature water tank producing statisticallysignificant gains accuracy standard neural network approach nonlearning PID controller Furthermore using PID knowledge initialize weights network produces statistically less variation testset accuracy compared networks initialized small random numbers
Random field models image analysis spatial statistics usually local interactions They simulated Markov chains update single site time The updating rules typically condition neighboring sites If want approximate expectation bounded function make better use simulations empirical estimator We describe symmetrizations empirical estimator computationally feasible lead considerable variance reduction The method reminiscent idea behind generalized von Mises statistics To simplify exposition consider mainly nearest neighbor random fields Gibbs sampler
Intrator proposed feature extraction method related recent statistical theory Huber Friedman based biologically motivated model neuronal plasticity Bienenstock et al This method recently applied feature extraction context recognizing D objects single D views Intrator Gold Here describe experiments designed analyze nature extracted features relevance theory psychophysics object recognition
The BuildingBlock Hypothesis appeals notion problem decomposition assembly solutions subsolutions Accordingly many varieties GA test problems structure based buildingblocks Many problems use deceptive fitness functions model interdependency bits within block However model interdependency buildingblocks consistent type interaction used intrablock interblock This paper discusses inadequacies various test problems literature clarifies concept buildingblock interdependency We formulate principled model hierarchical interdependency applied many levels consistent manner introduce Hierarchical Ifandonlyif HIFF canonical example We present empirical results GAs HIFF showing population diversity maintained linkage tight GA able identify manipulate buildingblocks many levels assembly BuildingBlock Hypothesis suggests
The MUSIC system MUlti Signal processor system Intelligent Communication parallel distributed memory architecture based digital signal processors DSP A system processor elements operational It peak performance GFlops electrical power consumption less W including forced air cooling fits rack Two applications backpropagation algorithm neural net learning molecular dynamics simulations run times faster CRAY YMP times faster NEC SX A sustained performance GFlops reached The selling price system would range US
In paper report Monte Carlo study dynamics large untrained feedforward neural networks randomly chosen weights feedback The analysis consists looking percent systems exhibit chaos distrubution largest Lyapunov exponents distrubution correlation dimensions As systems become complex increasing inputs neurons probability chaos approaches unity The correlation dimension typically much smaller system dimension
We introduce model analog computation discrete time presence analog noise flexible enough cover important concrete cases noisy analog neural nets networks spiking neurons This model subsumes classical model digital computation presence noise We show presence arbitrarily small amounts analog noise reduces power analog computational models finite automata also prove new type upper bound
In paper extend basic autologistic model include covariates indication sampling effort The model applied sampled data instead traditional use image analysis complete data available We adopt Bayesian setup develop hybrid Gibbs sampling estimation procedure Using simulated examples show autologistic model covariates sample data improves predictions compared simple logistic regression model standard autologistic model without covariates
Specifying constructing simulating structured connectionist networks requires significant programming effort System tools greatly reduce effort required providing conceptual structure within work make large complex network simulations possible The Rochester Connectionist Simulator system tool designed aid specification construction simulation connectionist networks This report describes tool detail facilities provided use well details implementation Through hope make designing verifying connectionist networks easier also encourage development refinement connectionist research tools
Many stateoftheart ILP systems require large numbers negative examples avoid overgeneralization This considerable disadvantage many ILP applications namely indu ctive program synthesis relativelly small sparse example sets realistic scenario Integrity constraints first order clauses play role negative examples inductive process One integrity constraint replace long list ground negative examples However checking consistency program set integrity constraints usually involves heavy oremproving We propose efficient constraint satisfaction algorithm applies wide variety useful integrity constraints uses Monte Carlo strategy It looks inconsistencies ra ndom generation queries program This method allows use integrity constraints instead together negative examples As consequence programs induce specified rapidly user ILP system tends obtain accurate definitions Average running times greatly affected use integrity constraints compared ground negative examples
This paper develops evolutionary trade network game TNG combines evolutionary game play endogenous partner selection Successive generations resourceconstrained buyers sellers choose refuse trade partners basis continually updated expected payoffs Trade partner selection takes place accordance modified GaleShapley matching mechanism trades implemented using trade strategies evolved via standardly specified genetic algorithm The trade partnerships resulting matching mechanism shown core stable Pareto optimal successive trade cycle Nevertheless computer experiments suggest static optimality properties may inadequate measures optimality evolutionary perspective
A neural network method identifying ancestor hadron jet presented The idea find efficient mapping certain observed hadronic kinematical variables quarkgluon identity This done neuronic expansion terms network sigmoidal functions using gradient descent procedure errors backpropagated network With method able separate gluon quark jets originating Monte Carlo generated e e events accuracy The result independent MC model used This approach isolating gluon jet used study socalled string effect In addition heavy quarks b c e e reactions identified level observing hadrons In particular able separate bquarks efficiency purity comparable expected vertex detectors We also speculate neural network method used disentangle different hadronization schemes compressing dimensionality state space hadrons
Selforganizing neural networks briefly reviewed compared supervised learning algorithms like backpropagation The power selforganization networks capability displaying typical features transparent manner This successfully demonstrated two applications hadronic jet physics hadronization model discrimination separation bc light quarks
The Langevin updating rule noise added weights learning presented shown improve learning problems initially illconditioned Hessians This particularly important multilayer perceptrons many hidden layers often illconditioned Hessians In addition Manhattan updating shown similar effect
The PAC learning rectangles studied found experimentally yield excellent hypotheses several applied learning problems Also pseudorandom sets rectangles actively studied recently subproblem common derandomization depth DNF circuits derandomizing Randomized Logspace ii approximate distribution n independent multivalued random variables We present improved upper bounds class problems approximating highdimensional rectangles arise PAC learning pseudorandomness
A distinction two forms task knowledge transfer representational functional reviewed followed discussion MTL modified version multiple task learning MTL neural network method functional transfer The MTL method employs separate learning rate k task output node k k varies function measure relatedness R k kth task primary task interest An MTL network applied diagnostic domain four levels coronary artery disease Results experiments demonstrate ability MTL develop predictive model one level disease superior diagnostic ability models produced either single task learning standard multiple task learning
This paper proposes genetic algorithms GAs path planning trajectory planning autonomous mobile robot Our GAbased approach advantage adaptivity GAs work even environment timevarying unknown Therefore suitable offline online motion planning We first presents GA path planning D terrain Simulation results performance adaptivity GA randomly generated terrains shown Then discuss extensions GA solving path planning trajectory planning simultaneously
Most work VapnikChervonenkis dimension neural networks focused feedforward networks However recurrent networks also widely used learning applications particular time relevant parameter This paper provides lower upper bounds VC dimension networks Several types activation functions discussed including threshold polynomial piecewisepolynomial sigmoidal functions The bounds depend two independent parameters number w weights network length k input sequence In contrast feedforward networks VC dimension bounds expressed function w An important difference recurrent feedforward nets fixed recurrent net receive inputs arbitrary length Therefore particularly interested case k w Ignoring multiplicative constants main results say roughly following For architectures activation fixed nonlinear polynomial VC dimension wk For architectures activation fixed piecewise polynomial VC dimension wk w k For architectures activation H threshold nets VC dimension w logkw minfwk log wk w w log wkg For standard sigmoid x e x VC dimension wk w k An earlier version paper appeared Proc rd European Workshop Computational Learning Theory LNCS pages Springer
The performance hillclimbing design optimization improved abstraction decomposition design space Methods automatically finding exploiting abstractions decompositions presented paper A technique called Operator Importance Analysis finds useful abstractions It determining given set operators important given class design problems Hillclimbing search runs faster performed using smaller set operators A technique called Operator Interaction Analysis finds useful decompositions It measuring pairwise interaction operators It uses measurements form ordered partition operator set This partition used hierarchic hillclimbing algorithm runs faster ordinary hillclimbing unstructured operator set We implemented techniques tested domain racing yacht hull design Our experimental results show two methods produce substantial speedups little loss quality resulting designs
This paper investigates algorithm construction decisions trees comprised linear threshold units also presents novel algorithm learning nonlinearly separable boolean functions using Madalinestyle networks isomorphic decision trees The construction networks discussed performance learning compared standard BackPropagation sample problem many irrelevant attributes introduced Littlestones Winnow algorithm also explored within architecture means learning presence many irrelevant attributes The learning ability Madalinestyle architecture nonoptimal larger necessary networks also explored
Lipid Research Clinic Program Lipid Research Clinic Program The Lipid Research Clinics Coronary Primary Prevention Trial results parts I II Journal American Medical Association January Pearl Judea Pearl Aspects graphical models connected causality Technical Report RLL Cognitive Systems Laboratory UCLA June Submitted Biometrika June Short version Proceedings th Session International Statistical Institute Invited papers Flo rence Italy August Tome LV Book pp
This paper investigates generation neural networks induction binary trees threshold logic units TLUs Initially describe framework tree construction algorithm show helps bridge gap pure connectionist neural network symbolic decision tree paradigms We also show trees threshold units induce transformed isomorphic neural network topology Several methods learning linear discriminant functions node tree structure examined shown produce accuracy results comparable classical information theoretic methods constructing decision trees use single feature tests node produce trees smaller thus easier understand Moreover results also show possible simultaneously learn topology weight settings neural network simply using training data set initially given
We consider problem learning DNF formulae mistakebound PAC models We develop new approach called polynomial explainability shown useful learning new subclasses DNF CNF formulae known learnable Unlike previous learnability results DNF CNF formulae subclasses limited number terms number variables per term yet contain subclasses kDNF ktermDNF corresponding classes CNF special cases We apply DNF results problem learning visual concepts obtain learning algorithms several natural subclasses visual concepts appear natural boolean counterpart On hand show learning natural subclasses visual concepts hard learning class DNF formulae We also consider robustness results various types noise
Typical approaches plan recognition start representation agents possible plans reason evidentially observations agents actions assess plausibility various candidates A expansive view task consistent prior work accounts context plan generated mental state planning process agent consequences agents actions world We present general Bayesian framework encompassing view focus context exploited plan recognition We demonstrate approach problem traffic monitoring objective induce plan driver observation vehicle movements Starting model driver generates plans show highway context appropriately influence recognizers interpretation observed driver havior
I present parallel algorithm exact probabilistic inference Bayesian networks For polytree networks n variables worstcase time complexity Olog n CREW PRAM concurrentread exclusivewrite parallel randomaccess machine n processors constant number evidence variables For arbitrary networks time complexity Or w log n n processors Ow log n r w n processors r maximum range variable w induced width maximum clique size moralizing trian gulating network
Several theories inference decision employ sets probability distributions fundamental representation subjective belief This paper investigates frequentist connection empirical data convex sets probability distributions Building earlier work Walley Fine framework advanced sequence random outcomes described drawn convex set distributions rather single distribution The extra generality detected observable characteristics outcome sequence The paper presents new asymptotic convergence results paralleling laws large numbers probability theory concludes comparison approach approaches based prior subjective constraints c fl Carnegie Mellon University
This reproduces report submitted Rome Laboratory October c flCopyright Jon Doyle All rights reserved Freely available via httpwwwmedglcsmitedudoyle Final Report Rational Distributed Reason Maintenance Abstract Efficiency dictates plans largescale distributed activities revised incrementally parts plans revised expected utility identifying revising subplans improve expected utility using original plan The problems identifying reconsidering subplans affected changed circumstances goals closely related problems revising beliefs new changed information gained But traditional techniques reason maintenancethe standard method belief revisionchoose revisions arbitrarily enforce global notions consistency groundedness may mean reconsidering beliefs plan elements step We develop revision methods aiming revise beliefs plans worth revising tolerate incoherence ungroundedness judged less detrimental costly revision effort We use artificial market economy planning revision tasks arrive overall judgments worth present representation qualitative preferences permits capture common forms dominance information
A feedforward neural network method developed reconstructing invariant mass hadronic jets appearing calorimeter The approach illustrated W q q W bosons produced pp reactions SPS collider energies The neural network method yields results superior conventional methods This neural network application differs classification ones sense analog number mass computed network rather binary decision made As byproduct application clearly demonstrates need using intelligent variables instances amount training instances limited
A new class highspeed selfadaptive massively parallel computing models called ASOCS Adaptive SelfOrganizing Concurrent Systems proposed Current analysis suggests may problems implementing ASOCS models VLSI using hierarchical network structures originally proposed The problems inherent models rather technology used implement This led development new ASOCS model called DNA DiscriminantNode ASOCS depend hierarchical node structure success Three areas DNA model briefly discussed paper DNAs flexible nodes DNA overcomes problems models allocating unused nodes DNA operates processing learning
This paper presents approach mobile robot path planning using casebased reasoning together mapbased path planning The mapbased path planner used seed casebase innovative solutions The casebase stores paths information traversability While planning route paths preferred according former experience least risky
To successful open multiagent environments autonomous agents must capable adapting negotiation strategies tactics prevailing circumstances To end present empirical study showing relative success different strategies different types opponent different environments In particular adopt evolutionary approach strategies tactics correspond genetic material genetic algorithm We conduct series experiments determine successful strategies see strategies evolve depending context negotiation stance agents opponent
Efficiency dictates plans largescale distributed activities revised incrementally parts plans revised expected utility identifying revising subplans improves expected utility using original plan The problems identifying reconsidering subplans affected changed circumstances goals closely related problems revising beliefs new changed information gained But traditional techniques reason maintenancethe standard method belief revisionchoose revisions arbitrarily enforce global notions consistency groundedness may mean reconsidering beliefs plan elements step To address problems developed revision methods aimed revising beliefs plans worth revising tolerating incoherence ungroundedness judged less detrimental costly revision effort artificial market economy planning revision tasks arriving overall judgments worth representation qualitative preferences permits capture common forms dominance information We view activities intelligent agents stemming interleaved simultaneous planning replanning execution observation subactivities In model plan construction process agents continually evaluate revise plans light happens world Planning necessary organization largescale activities decisions actions taken future direct impact done shorter term But even wellconstructed value plan decays changing circumstances resources information objectives render original course action inappropriate When changes occur execution plan may necessary construct new plan starting scratch revising previous plan portions plan actually affected changes Given information accrued plan execution remaining parts original plan salvaged ways parts changed Incremental replanning first involves localizing potential changes conflicts identifying subset extant beliefs plans occur It involves choosing identified beliefs plans keep change For greatest efficiency choices portion plan revise revise based coherent expectations preferences among consequences different alternatives rational sense decision theory Savage Our work toward mechanizing rational planning replanning focussed four main issues This paper focusses latter three issues approach first see Doyle Replanning incremental local manner requires planning procedures routinely identify assumptions made planning connect plan elements assumptions replanning may seek change portions plan dependent upon assumptions brought question new information Consequently problem revising plans account changed conditions much
In paper examine previous work naive Bayesian classifier review limitations include sensitivity correlated features We respond problem embedding naive Bayesian induction scheme within algorithm carries greedy search space features We hypothesize approach improve asymptotic accuracy domains involve correlated features without reducing rate learning ones We report experimental results six natural domains including comparisons decisiontree induction support hypotheses In closing discuss approaches extending naive Bayesian classifiers outline directions future research
In paper present novel induction algorithm Bayesian networks This selective Bayesian network classifier selects subset attributes maximizes predictive accuracy prior network learning phase thereby learning Bayesian networks bias small highpredictiveaccuracy networks We compare performance classifier selective nonselective naive Bayesian classifiers We show selective Bayesian network classifier performs significantly better versions naive Bayesian classifier almost databases analyzed hence enhancement naive Bayesian classifier Relative nonselective Bayesian network classifier selective Bayesian network classifier generates networks computationally simpler evaluate display predictive accuracy comparable Bayesian networks model features
We attempt recover unknown function noisy sampled data Using orthonormal bases compactly supported wavelets develop nonlinear method works wavelet domain simple nonlinear shrinkage empirical wavelet coefficients The shrinkage tuned nearly minimax member wide range Triebel Besovtype smoothness constraints asymptotically minimax Besov bodies p q Linear estimates achieve even minimax rates Triebel Besov classes p lt method significantly outperform every linear method kernel smoothing spline sieve minimax sense Variants method based simple threshold nonlinearities nearly minimax Our method possesses interpretation spatial adaptivity reconstructs using kernel may vary shape bandwidth point point depending data Least favorable distributions certain Triebel Besov scales generate objects sparse wavelet transforms Many real objects similarly sparse transforms suggests minimax results relevant practical problems Sequels paper discuss practical implementation spatial adaptation properties applications inverse problems Acknowledgements This work completed first author leave UC Berkeley research supported NSF DMS NASA Contract NCA grant ATT Foundation The second author supported part NSF grants DMS NIH PHS grant GM Supersedes earlier version titled Wavelets Optimal Function Estimation dated November issued Technical reports Departments Statistics Stanford UC Berkeley
It established good software engineering practice ensure programs use memory via abstract data structures stacks queues lists These provide interface program memory freeing program memory management details left data structures implement The main result presented herein GP automatically generate stacks queues Typically abstract data structures support multiple operations put get We show GP simultaneously evolve operations data structure implementing operation independent program tree That chromosome consists fixed number independent program trees Moreover crossover mixes genetic material program trees implement operation Program trees interact via shared memory shared Automatically Defined Functions ADFs
One main experimental tools probing interactions neurons measurement correlations activity In general however interpretation observed correlations difficult since correlation pair neurons influenced direct interaction also dynamic state entire network belong Thus comparison observed correlations predictions specific model networks needed In paper develop theory neuronal correlation functions large networks comprising several highly connected subpopulations obeying stochastic dynamic rules When networks asynchronous states crosscorrelations relatively weak ie amplitude relative autocorrelations order N N size interacting populations Using weakness crosscorrelations general equations express matrix crosscorrelations terms mean neuronal activities effective interaction matrix presented The effective interactions synaptic efficacies multiplied gain postsynaptic neurons The timedelayed crosscorrelation matrix expressed sum exponentially decaying modes correspond nonorthogonal eigenvectors effective interaction matrix The theory extended networks random connectivity randomly dilute networks This allows comparison contribution internal common input direct
Technical Report Department Statistics University Washington Sonia Petrone Assistant Professor Universita di Pavia Dipartimento di Economia Politica e Metodi Quantitativi I Pavia Italy Adrian E Raftery Professor Statistics Sociology Department Statistics University Washington Box Seattle WA This research supported ONR grant NJ grants MURST Rome
Interference neural networks occurs learning one area input space causes unlearning another area Networks less susceptible interference called spatially local networks These networks often used neurocontrol online applications real time nature task interference often problem Although heuristics makes network local theoretical framework measuring localization This paper provides formal definition interference localization allow us measure networks local properties These definitions useful developing learning algorithms make networks local This may lead faster learning entire input domain
A considerable body evidence prosopagnosia deficit face recognition dissociable nonface object recognition indicates visual system devotes specialized functional area mechanisms appropriate face processing We present modular neural network composed two expert networks one mediating gate network task learning recognize faces individuals classifying nonface objects members one three classes While learning task network tends divide labor two expert modules one expert specializing face processing specializing nonface object processing After training observe networks performance test set one experts progressively damaged The results roughly agree data reported prosopagnosic patients damage face expert increases networks face recognition performance decreases dramatically object classification performance drops slowly We conclude datadriven competitive learning two unbiased functional units give rise localized face processing selective damage system could underlie prosopagnosia
A neural network model called LISSOM cooperative selforganization afferent lateral connections cortical maps applied modeling cortical plasticity After selforganization LISSOM maps dynamic equilibrium input reorganize like cortex response simulated cortical lesions intracortical microstimulation The model predicts adapting lateral interactions fundamental cortical reorganization suggests techniques hasten recovery following sensory cortical surgery
One way represent machine learning algorithms bias hypothesis instance space pair probability distributions This approach taken within Bayesian learning schemes framework Ulearnability However obvious Inductive Logic Programming ILP system best provided probability distribution This paper extends results previous paper author introduced stochastic logic programs means providing structured definition probability distribution Stochastic logic programs generalisation stochastic grammars A stochastic logic program consists set labelled clauses p C p interval C rangerestricted definite clause A stochastic logic program P distributional semantics one assigns probability distribution atoms predicate Herbrand base clauses P These probabilities assigned atoms according SLDresolution strategy employs stochastic selection rule It shown probabilities computed directly failfree logic programs normalisation arbitrary logic programs The stochastic proof strategy used provide three distinct functions method sampling Herbrand base used provide selected targets example sets ILP experiments measure information content examples hypotheses used guide search ILP system simple method conditioning given stochastic logic program samples data Functions used measure generality hypotheses ILP system Progol This supports implementation Bayesian technique learning positive examples fl This paper extension paper title appeared
A novel approach learning first order logic formulae positive negative examples incorporated system named ICL Inductive Constraint Logic In ICL examples viewed interpretations true false target theory whereas present inductive logic programming systems examples true false ground facts clauses Furthermore ICL uses clausal representation corresponds conjunctive normal form conjunct forms constraint positive examples whereas classical learning techniques concentrated concept representations disjunctive normal form We present experiments new system mutagenesis problem These experiments illustrate differences systems indicate approach work least well classical approaches
In paper report result Monte Carlo study probability chaos large dynamical systems We use neural networks basis functions system dynamics choose parameter values networks randomly Our results show dimension system complexity network increase probability chaotic dynamics increases Since neural networks dense set dynamical systems conclusion large systems chaotic
Automated synthesis analog electronic circuits recognized difficult problem Genetic programming used evolve b h topology sizing n u e r c l v l u e f r e c h component circuit perform source identification correctly cl assify incoming signal categories
Bell Sejnowski derived blind signal processing algorithm nonlinear feedforward network information maximization viewpoint This paper first shows algorithm viewed maximum likelihood algorithm optimization linear generative model Third paper gives partial proof folktheorem mixture sources highkurtosis histograms separable classic ICA algorithm
I present expectationmaximization EM algorithm principal component analysis PCA The algorithm allows eigenvectors eigenvalues extracted large collections high dimensional data It computationally efficient space time It also naturally accommodates missing information I also introduce new variant PCA called sensible principal component analysis SPCA defines proper density model data space Learning SPCA also done EM algorithm I report results synthetic real data showing EM algorithms correctly efficiently find leading eigenvectors covariance datasets iterations using hundreds thousands datapoints thousands dimensions
We present new algorithms parameter estimation HMMs By adapting framework used supervised learning construct iterative algorithms maximize likelihood observations also attempting stay close current estimated parameters We use bound relative entropy two HMMs distance measure The result new iterative training algorithms similar EM BaumWelch algorithm training HMMs The proposed algorithms composed step similar expectation step BaumWelch new update parameters replaces maximization reestimation step The algorithm takes negligibly time per iteration approximated version uses expectation step BaumWelch We evaluate experimentally new algorithms synthetic natural speech pronunciation data For sparse models ie models relatively small number nonzero parameters proposed algorithms require significantly fewer iterations
We investigate distribution performance Boolean functions Boolean inputs particularly parity functions alwayson even parity functions We us enumeration uniform MonteCarlo random sampling sampling random full trees As expected XOR dramatically changes fitness distributions In cases minimum size threshold exceeded distribution performance approximately independent program length However distribution performance full trees different asymmetric trees varies tree depth We consider reject testing No Free Lunch NFL theorems functions
Technical Report BUM Biometrics Unit Cornell University Abstract We analyse convergence stationarity simple nonreversible Markov chain serves model several nonreversible Markov chain sampling methods used practice Our theoretical numerical results show nonreversibility indeed lead improvements diffusive behavior simple Markov chain sampling schemes The analysis uses probabilistic techniques explicit diagonalisation We thank David Aldous Martin Hildebrand Brad Mann Laurent SaloffCoste help
Principal component analysis PCA one popular techniques processing compressing visualising data although effectiveness limited global linearity While nonlinear variants PCA proposed alternative paradigm capture data complexity combination local linear PCA projections However conventional PCA correspond probability density unique way combine PCA models Previous attempts formulate mixture models PCA therefore extent ad hoc In paper PCA formulated within maximumlikelihood framework based specific form Gaussian latent variable model This leads welldefined mixture model probabilistic principal component analysers whose parameters determined using EM algorithm We discuss advantages model context clustering density modelling local dimensionality reduction demonstrate application image compression handwritten digit recognition
programs independently evolved using fixed randomlygenerated fitness cases These programs subsequently tested large representative fixed population pursuers determine relative effectiveness This paper describes implementation original modified systems summarizes results tests
Institute Neural Computation Technical Report Series No INC January University California San Diego La Jolla CA Abstract Computational models neural map formation considered least three different levels abstraction detailed models including neural activity dynamics weight dynamics abstract neural activity dynamics adiabatic approximation objective functions weight dynamics may derived gradient flows In paper present example objective function derived detailed nonlinear neural dynamics A systematic investigation reveals different weight dynamics introduced previously derived objective functions generated prototypical terms This includes dynamic link matching special case neural map formation We focus particular role coordinate transformations derive different weight dynamics objective function Coordinate transformations also important deriving normalization rules constraints Several examples illustrate objective functions help understanding generating comparing different models neural map formation The techniques used analysis may also useful investigating types neural dynamics
Realvalued random hidden variables useful modelling latent structure explains correlations among observed variables I propose simple unit adds zeromean Gaussian noise input passing sigmoidal squashing function Such units produce variety useful behaviors ranging deterministic binary stochastic continuous stochastic I show slice sampling Neal used inference learning topdown networks units demonstrate learning two simple problems
There obvious need improving performance accuracy Bayesian network new data observed Because errors model construction changes dynamics domains afford ignore information new data While sequential update parameters fixed structure accomplished using standard techniques sequential update network structure still open problem In paper investigate sequential update Bayesian networks parameters structure expected change We introduce new approach allows flexible manipulation tradeoff quality learned networks amount information maintained past observations We formally describe approach including necessary modifications scoring functions learning Bayesian networks evaluate effectiveness empirical study extend case missing data
This paper sketches several aspects hypothetical cortical architecture visual object recognition based recent computational model The scheme relies modules learning examples Hyperbflike networks basic components Such models intended precise theories biological circuitry rather capture class explanations call MemoryBased Models MBM contains sparse population coding memorybased recognition codebooks prototypes Unlike sigmoidal units artificial neural networks units MBMs consistent usual description cortical neurons tuned multidimensional optimal stimuli We describe example MBM may realized terms cortical circuitry biophysical mechanisms consistent psychophysical physiological data A number predictions testable physiological techniques made This memo describes research done within Center Biological Computational Learning Department Brain Cognitive Sciences Artificial Intelligence Laboratory Massachusetts Institute Technology This research sponsored grants Office Naval Research contracts NJ N grant National Science Foundation contract ASC award includes funds ARPA provided HPCC program Additional support provided North Atlantic Treaty Organization ATR Audio Visual Perception Research Laboratories Mitsubishi Electric Corporation Sumitomo Metal Industries Siemens AG Support AI Laboratorys artificial intelligence research provided ARPA contract NJ Tomaso Poggio supported Uncas Helen Whitaker Chair MITs Whitaker College
We exploit qualitative probabilistic relationships among variables computing bounds conditional probability distributions interest Bayesian networks Using signs qualitative relationships implement abstraction operations guaranteed bound distributions interest desired direction By evaluating incrementally improved approximate networks algorithm obtains monotonically tightening bounds converge exact distributions For supermodular utility functions tightening bounds monotonically reduce set admissible decision alternatives well
A parallel algorithm proposed fundamental problem machine learning multicategory discrimination The algorithm based minimizing error function associated set highly structured linear inequalities These inequalities characterize piecewiselinear separation k sets maximum k affine functions The error function Lipschitz continuous gradient allows use fast serial parallel unconstrained minimization algorithms A serial quasiNewton algorithm considerably faster previous linear programming formulations A parallel gradient distribution algorithm used parallelize errorminimization problem Preliminary computational results given DECstation
This paper presents large systematic body data relative effectiveness mutation crossover combinations mutation crossover genetic programming GP The literature traditional genetic algorithms contains related studies mutation crossover GP differ traditional counterparts significant ways In paper present results large experimental data set equivalent approximately typical runs GP system systematically exploring range parameter settings The resulting data may useful practitioners seeking optimize parameters GP runs also theorists exploring issues role building blocks GP
Technical Report No Department Statistics University Toronto Markov chain Monte Carlo methods Gibbs sampling simple forms Metropolis algorithm typically move distribution sampled via random walk For complex highdimensional distributions commonly encountered Bayesian inference statistical physics distance moved iteration algorithms usually small difficult impossible transform problem eliminate dependencies variables The inefficiency inherent taking small steps greatly exacerbated algorithm operates via random walk case moving point n steps away typically take around n iterations Such random walks sometimes suppressed using overrelaxed variants Gibbs sampling aka heatbath algorithm methods hitherto largely restricted problems full conditional distributions Gaussian I present overrelaxed Markov chain Monte Carlo algorithm based order statistics widely applicable In particular algorithm applied whenever full conditional distributions cumulative distribution functions inverse cumulative distribution functions efficiently computed The method demonstrated inference problem simple hierarchical Bayesian model
This report reviews various optimum decision rules pattern recognition namely Bayes rule Chows rule optimum errorreject tradeoff recently proposed classselective rejection rule The latter provides optimum tradeoff error rate average number selected classes A new general relation error rate average number classes presented The error rate directly computed classselective reject function turn estimated unlabelled patterns simply counting rejects Theoretical well practical implications discussed future research directions proposed
We utilize collective memory integrate weak strong search heuristics find cliques FC family graphs We construct FC pruning partial solutions ineffective Each weak heuristic maintains local cache collective memory We examine impact distributed search various characteristics distribution collective memory search algorithms family graphs We find distributed search performs better individuals even though space partial solutions combinatorially explosive
In paper investigate integration knowledge acquisition machine learning techniques We argue existing machine learning techniques made useful knowledge acquisition tools allowing expert greater control interaction learning process We describe number extensions FOCL multistrategy Hornclause learning program greatly enhanced power knowledge acquisition tool paying particular attention utility maintaining connection rule set examples explained rule The objective research make modification domain theory analogous use spread sheet A prototype knowledge acquisition tool FOCL constructed order evaluate strengths weaknesses approach
Starting likelihood preference order worlds extend likelihood ordering sets worlds natural way examine resulting logic Lewis earlier considered notion relative likelihood context studying counterfactuals assumed total preference order worlds Complications arise examining partial orders present total orders There subtleties involving exact approach lifting order worlds order sets worlds In addition axiomatization logic relative likelihood case partial orders gives insight connection relative likelihood default reasoning
We empirically investigate properties search space behavior hillclimbing search solving hard random Boolean satisfiability problems In experiments frequently observed rather attempting escape plateaus extensive search better completely restart new random initial state The optimum point terminate search restart determined empirically range problem sizes complexities The growth rate optimum cutoff faster linear number features although exact growth rate determined Based empirical results simple runtime heuristic proposed determine give searching plateau restart This heuristic closely approximates empirically determined optimum values range problem sizes complexities consequently allows search algorithm automatically adjust strategy particular problem without prior knowledge problems complexity
The choice input representation neural network profound impact accuracy classifying novel instances However neural networks typically computationally expensive train making difficult test large numbers alternative representations This paper introduces fast quality measures neural network representations allowing one quickly accurately estimate collection possible representations problem best We show measures ranking representations accurate previously published measure based experiments three difficult realworld pattern recognition problems
FURTHER RESULTS ON CONTROLLABILITY PROPERTIES OF DISCRETETIME NONLINEAR SYSTEMS fl ABSTRACT Controllability questions discretetime nonlinear systems addressed paper In particular continue search conditions grouplike notion transitivity implies stronger semigrouplike property forward accessibility We show implication holds pointwise states weak Poisson stability property globally exists global attractor system
We propose algorithm solving systems monotone equations combines Newton proximal point projection methodologies An important property algorithm whole sequence iterates always globally convergent solution system without additional regularity assumptions Moreover standard assumptions local superlinear rate convergence achieved As opposed classical globalization strategies Newton methods computing stepsize use linesearch aimed decreasing value merit function Instead linesearch approximate Newton direction used construct appropriate hyperplane separates current iterate solution set This step followed projecting current iterate onto hyperplane ensures global convergence algorithm Computational cost iteration method order classical damped Newton method The crucial advantage method truly globally convergent In particular get trapped stationary point merit function The presented algorithm motivated hybrid projectionproximal point method proposed
This paper shows performance genetic programming system improved addition mechanisms nongenetic transmission information individuals culture Teller previously shown genetic programming systems enhanced addition memory mechanisms individual programs Teller paper show Tellers memory mechanism changed allow communication individuals within across generations We show effects indexed memory culture performance genetic programming system symbolic regression problem Kozas Lawnmower problem Wumpus world agent problems We show culture reduce computational effort required solve problems We conclude discussion possible improvements
This paper reviews large number CBR systems determine sort adaptation currently used Three taxonomies proposed adaptationrelevant taxonomy CBR systems taxonomy tasks performed CBR systems taxonomy adaptation knowledge To extent set existing systems reflects constraints feasible review shows interesting dependencies different systemtypes tasks systems achieve adaptation needed meet system goals The CBR system designer may find partition CBR systems division adaptation knowledge suggested paper useful Moreover paper may help focus initial stages systems development suggesting basis existing work types adaptation knowledge supported new system In addition paper provides framework preliminary evaluation comparison systems
Constructive learning algorithms offer approach incremental construction nearminimal artificial neural networks pattern classification Examples algorithms include Tower Pyramid Upstart Tiling algorithms construct multilayer networks threshold logic units multilayer perceptrons These algorithms differ terms topology networks construct turn biases search decision boundary correctly classifies training set This paper presents analysis algorithms geometrical perspective This analysis helps better characterization search bias employed different algorithms relation geometrical distribution examples training set Simple experiments non linearly separable training sets support results mathematical analysis algorithms This suggests possibility designing efficient constructive algorithms dynamically choose among different biases build nearminimal networks pattern classification
D Aldous P Shields A diffusion limit class randomly growing binary trees Probability Theory R Breathnach C Benoist K OHare F Gannon P Chambon Ovalbumin gene Evidence leader sequence mRNA DNA sequences exonintron boundaries Proceedings National Academy Science S Brunak J Engelbrecht S Knudsen Prediction human mRNA donor acceptor sites DNA sequence Journal Molecular Biology Jack Cophen Ian Stewart The information hand The Mathematical Intelligencer R G Gallager Information Theory Reliable Communication John Wiley Sons Inc Ali Hariri Bruce Weber John Olmstead On validity Shannoninformation calculations molecular biological sequence Journal Theoretical Biology W B Davenport Jr W L Root An Introduction Theory Random Signals Noise McGrawHill Andrzej Knopka John Owens Complexity charts used map functional domains DNA Gene Anal Techn SM Mount A catalogue splicejunction sequences Nucleic Acids Research HM Seidel DL Pompliano JR Knowles Exons microgenes Science September C E Shannon A mathematical theory communication Bell System Tech J Peter S Shenkin Batu Erman Lucy D Mastrandrea Informationtheoretical entropy measure sequence variability Proteins R Staden Measurements effects coding protein DNA sequence use finding genes Nucleic Acids Research JA Steitz Snurps Scientific American June H van Trees Detection estimation modulation theory Wiley J D Watson N H Hopkins J W Roberts J Argetsinger Steitz A M Weiner Molecular Biology Gene BenjaminCummings Menlo Park CA fourth edition AD Wyner AJ Wyner An improved version LempelZiv algorithm Transactions Information Theory AJ Wyner String Matching Theorems Applications Data Compression Statistics PhD thesis Stanford University J Ziv A Lempel A universal algorithm sequential data compression IEEE Transactions Information Theory IT
Temporaldifference TD learning used predict rewards commonly done reinforcement learning also predict states ie learn model worlds dynamics We present theory algorithms intermixing TD models world different levels temporal abstraction within single structure Such multiscale TD models used modelbased reinforcementlearning architectures dynamic programming methods place conventional Markov models This enables planning higher varied levels abstraction may prove useful formulating methods hierarchical multilevel planning reinforcement learning In paper treat prediction problemthat learning model value function case fixed agent behavior Within context establish theoretical foundations multiscale models derive TD algorithms learning Two small computational experiments presented test illustrate theory This work extension generalization work Singh Dayan Sutton Pinette
This paper scientific comparison two code generation techniques identical goals generation best possible software pipelined code computers instruction level parallelism Both variants modulo scheduling framework generation software pipelines pioneered Rau Glaser RaGl otherwise quite dissimilar One technique developed Silicon Graphics used MIPSpro compiler This production compiler SGI systems based MIPS R processor Hsu It essentially branchandbound enumeration possible schedules extensive pruning This method heuristic way prunes also interaction register allocation scheduling The second technique aims produce optimal results formulating scheduling register allocation problem integrated integer linear programming ILP problem This idea received much recent exposure literature AlGoGa Feautrier GoAlGaa GoAlGab Eichenberger knowledge previous implementations preliminary detailed measurement evaluation In particular believe first published measurement runtime performance ILP based generation software pipelines A particularly valuable result study evaluation heuristic pipelining technology SGI compiler One motivations behind McGill research hope optimal software pipelining practical use production compilers would useful evaluation validation Our comparison indeed provided quantitative validation SGI compilers pipeliner leading us increased confidence techniques
Paper BibTeX entry available httpwwwcomplangtuwienacatpapers This paper published Compiler Construction CC Springer LNCS pages Delayed Exceptions Speculative Execution Abstract Superscalar processors execute basic blocks sequentially use much instruction level parallelism Speculative execution proposed execute basic blocks parallel A pure software approach suffers low performance exceptiongenerating instructions executed speculatively We propose delayed exceptions combination hardware compiler extensions provide high performance correct exception handling compilerbased speculative execution Delayed exceptions exploit fact exceptions rare The compiler assumes typical case exceptions schedules code accordingly inserts runtime checks fixup code ensure correct execution exceptions happen
Fuzzy rules control effectively tuned via reinforcement learning Reinforcement learning weak learning method requires information success failure control application The tuning process allows people generate fuzzy rules unable accurately perform control tuned rules provide smooth control This paper explores new simplified method using reinforcement learning tuning fuzzy control rules It shown learned fuzzy rules provide smoother control pole balancing domain another approach
Procedural representations control policies two advantages facing scaleup problem learning tasks First implicit potential inductive generalization large set situations Second facilitate modularization In paper compare several randomized algorithms learning modular procedural representations The main algorithm called Adaptive Representation Learning ARL genetic programming extension relies discovery subroutines ARL suitable learning hierarchies subroutines constructing policies complex tasks ARL successfully tested typical reinforcement learning problem controlling agent dynamic nondeterministic environment discovered subroutines correspond agent behaviors
We propose modification classical proximal point algorithm finding zeroes maximal monotone operator Hilbert space In particular approximate proximal point iteration used construct hyperplane strictly separates current iterate solution set problem This step followed projection current iterate onto separating hyperplane All information required projection operation readily available end approximate proximal step therefore projection entails additional computational cost The new algorithm allows significant relaxation tolerance requirements imposed solution proximal point subproblems yields practical framework Weak global convergence local linear rate convergence established suitable assumptions Additionally presented analysis yields alternative proof convergence exact proximal point method allows nice geometric interpretation somewhat intuitive classical proof
We present Resource Spackling framework integrating register allocation instruction scheduling based Measure Reduce paradigm The technique measures resource requirements program uses measurements distribute code better resource allocation The technique applicable allocation different types resources A programs resource requirements register functional unit resources first measured using unified representation These measurements used find areas resources either utilized called resource holes excessive sets respectively Conditions determined increasing resource utilization resource holes These conditions applicable local global code motion
We introduce new model distributions generated random walks graphs This model suggests variety learning problems using definitions models distribution learning defined Our framework general enough model previously studied distribution learning problems well suggest new applications We describe special cases general problem investigate relative difficulty We present algorithms solve learning problem various conditions
A decision structure acyclic graph specifies order tests applied object situation arrive decision object serves simple powerful tool organizing decision process This paper proposes methodology learning decision structures oriented toward specific decision making situations The methodology consists two phases determining storing declarative rules describing decision process deriving online decision structure rules The first step performed expert AQbased inductive learning program learns decision rules examples decisions AQ AQ The second step transforms decision rules decision structure suitable given decision making situation The system AQDT implementing second step applied problem construction engineering In experiments AQDT outperformed programs applied problem terms accuracy simplicity generated decision structures Key words machine learning inductive learning decision structures decision rules attribute selection
Most constructive induction researchers focus new boolean attributes This paper reports new constructive induction algorithm called XofN constructs new nominal attributes form XofN representations An XofN set containing one attributevalue pairs For given instance value corresponds number attributevalue pairs true The promising preliminary experimental results artificial realworld domains show constructing new nominal attributes form XofN representations significantly improve performance selective induction terms higher prediction accuracy lower theory complexity
Coevolution Pursuit Evasion II Simulation Methods Results fl Abstract In previous SAB paper presented scientific rationale simulating coevolution pursuit evasion strategies Here present overview simulation methods results Our notable results follows First coevolution works produce good pursuers good evaders pure bootstrapping process types rather specially adapted opponents current counterstrategies Second eyes brains also coevolve within simulated species example pursuers usually evolved eyes front bodies like cheetahs evaders usually evolved eyes pointing sideways even backwards like gazelles Third kinds coevolution promoted allowing spatially distributed populations gene duplication explicitly spatial morphogenesis program eyes brains allows bilateral symmetry The paper concludes discussing possible applications simulated pursuitevasion coevolu tion biology entertainment
Learning store information extended time intervals via recurrent backpropagation takes long time mostly due insufficient decaying error back flow We briefly review Hochreiters analysis problem address introducing novel efficient method called Long ShortTerm Memory LSTM LSTM learn bridge time lags excess steps enforcing constant error flow constant error carrousels within special units Multiplicative gate units learn open close access constant error flow LSTMs update complexity per time step OW W number weights In experimental comparisons RTRL BPTT Recurrent CascadeCorrelation Elman nets Neural Sequence Chunking LSTM leads many successful runs learns much faster LSTM also solves complex long time lag tasks never solved previous recurrent network algorithms It works local distributed realvalued noisy pattern representations
Geometric separability generalisation linear separability familiar many Minsky Paperts analysis Perceptron learning method The concept forms novel dimension along conceptualise learning methods The present paper shows geometric separability defined demonstrates accurately predicts performance least one empirical learning method
Inspired visual motion detection model rabbit retina computational architecture used early audition barn owl designed chip employs correlation model report onedimensional field motion scene real time Using subthreshold analog VLSI techniques fabricated successfully tested transistor chip using standard MOSIS process
In associative reinforcement learning environment generates input vectors learning system generates possible output vectors reinforcement function computes feedback signals inputoutput pairs The task discover remember inputoutput pairs generate rewards Especially difficult cases occur rewards rare since expected time algorithm grow exponentially size problem Nonetheless reinforcement function possesses regularities learning algorithm exploits learning time reduced nongeneralizing algorithms This paper describes neural network algorithm called complementary reinforcement backpropagation CRBP reports simulation results problems designed offer differing opportunities generalization
We prove Canonical Distortion Measure CDM optimal distance measure use nearestneighbour NN classification show reduces squared Euclidean distance feature space function classes expressed linear combinations fixed set features PAClike bounds given samplecomplexity required learn CDM An experiment presented neural network CDM learnt Japanese OCR environ ment used NN classification
The schema theorem states implicit parallel search behind power genetic algorithm We contend chromosomes vote proportionate fitness candidate schemata We maintain population binary strings ternary schemata The string population works solving problem domain supplies fitness schema population indirectly solve original problem
The objective statistical data analysis describe behaviour system also propose construct check model observed processes Bayesian methodology offers one possible approaches estimation unknown components model parameters functional components framework chosen model type However many instances evaluation Bayes posterior distribution basal Bayesian solutions difficult practically intractable even help numerical approximations In cases Bayesian analysis may performed help intensive simulation techniques called Markov chain Monte Carlo The present paper reviews best known approaches MCMC generation It deals several typical situations data analysis model construction MCMC methods successfully applied Special attention devoted problem selection optimal regression model constructed regression splines functional units
RK Belew J McInerney N Schraudolph Evolving networks using genetic algorithm connectionist learning Artificial Life II SFI Studies Science Complexity CG Langton C Taylor JD Farmer S Rasmussen Eds vol AddisonWesley M McInerney AP Dhawan Use genetic algorithms back propagation training feedforward neural networks IEEE International Conference Neural Networks vol pp FZ Brill DE Brown WN Martin Fast genetic selection features neural network classifiers IEEE Transactions Neural Networks vol pp F Dellaert J Vandewalle Automatic design cellular neural networks means genetic algorithms finding feature detector The Third IEEE International Workshop Cellular Neural Networks Their Applications IEEE New Jersey pp DE Moriarty R Miikkulainen Efficient reinforcement learning symbiotic evolution Machine Learning vol pp L Davis Handbook Genetic Algorithms Van Nostrand Reinhold New York D Whitely The GENITOR algorithm selective pressure Proceedings Third Interanational Conference Genetic Algorithms JD Schaffer Ed Morgan Kauffman San Mateo CA pp van Camp D T Plate GE Hinton The Xerion Neural Network Simulator Documentation Department Computer Science University Toronto Toronto
An agent must learn act world trial error faces reinforcement learning problem quite different standard concept learning Although good algorithms exist problem general case often quite inefficient exhibit generalization One strategy find restricted classes action policies learned efficiently This paper pursues strategy developing algorithms efficiently learn action maps expressible kDNF The algorithms compared existing methods empirical trials shown good performance
This note considers positive recurrent Markov chains probability remaining current state arbitrarily close Specifically conditions given ensure nonexistence central limit theorems ergodic averages functionals chain The results motivated applications MetropolisHastings algorithms constructed terms rejection probability rejection involves remaining current state Two examples commonly used algorithms given independence sampler Metropolis adjusted Langevin algorithm The examples rather specialised although cases problems arise typical problems commonly occurring particular algorithm used I would like thank Kerrie Mengersen Jeff Rosenthal Richard Tweedie useful conversations subject paper
A reinforcement learning system limited computational resources interacts unrestricted unknown environment Its goal maximize cumulative reward obtained throughout limited unknown lifetime System policy arbitrary modifiable algorithm mapping environmental inputs internal states outputs new internal states The problem realistic unknown environments policy modification process PMP occurring system life may unpredictable influence environmental states rewards PMPs later time Existing reinforcement learning algorithms properly deal Neither naive exhaustive search among policy candidates even case small search spaces In fact reasonable way measuring performance improvements general typical situations missing I define measure based novel reinforcement acceleration criterion RAC At given time RAC satisfied beginning completed PMP computed currently valid policy modification followed longterm acceleration average reinforcement intake computation time later PMPs taken account I present method called environmentindependent reinforcement acceleration EIRA guaranteed achieve RAC EIRA neither care whether systems policy allows changing whether multiple interacting learning systems Consequences sound theoretical framework metalearning success PMP recursively depends success later PMPs setting stage A sound theoretical framework multiagent learning The principles implemented single system using assemblerlike programming language modify policy system consisting multiple agents agent fact connection fully recurrent reinforcement learning neural net A byproduct research general reinforcement learning algorithm nets Preliminary experiments illustrate theory
Evolutionary computation uses computational models evolution ary processes key elements design implementation computerbased problem solving systems In paper provide overview evolutionary computation describe several evolutionary algorithms currently interest Important similarities di fferences noted lead discussion important issues need resolved items future research
There strong evidence face processing localized brain The double dissociation prosopagnosia face recognition deficit occurring brain damage visual object agnosia difficulty recognizing kinds complex objects indicates face nonface object recognition may served partially independent mechanisms brain Is neural specialization innate learned We suggest specialization could result competitive learning mechanism development devotes neural resources tasks best performing Further suggest specialization arises interaction task requirements developmental constraints In paper present feedforward computational model visual processing two modules compete classify input stimuli When one module receives low spatial frequency information receives high spatial frequency information task identify faces simply classifying objects low frequency network shows strong specialization faces No combination tasks inputs shows strong specialization We take results support idea innatelyspecified face processing module unnecessary
We present general method proving rigorous priori bounds number iterations required achieve convergence Markov chain Monte Carlo We describe bounds specific models Gibbs sampler obtained general method We discuss possibilities obtaining bounds generally
Much recent research decision theoretic planning adopted Markov decision processes MDPs model choice attempted make solution tractable exploiting problem structure One particular algorithm structured policy construction achieves means decision theoretic analog goal regression using action descriptions based Bayesian networks treestructured conditional probability tables The algorithm presented able deal actions correlated effects We describe new decision theoretic regression operator corrects weakness While conceptually straightforward extension requires somewhat complicated technical approach
The problem programming artificial ant follow Santa Fe trail repeatedly used benchmark problem Recently shown performance several techniques much better best performance obtainable using uniform random search We suggested could program fitness landscape difficult hill climbers problem also difficult Genetic Algorithms contains multiple levels deception Here redefine problem ant obliged traverse trail approximately correct order A simple genetic programming system size depth restriction show perform approximately three times better improved training function
In general machine learning process accelerated use heuristic knowledge problem solution For example monomorphic typed Genetic Programming GP uses type information reduce search space improve performance Unfortunately monomorphic typed GP also loses generality untyped GP generated programs suitable inputs specified type Polymorphic typed GP improves monomorphic untyped GP allowing type information expressed generic manner yet still imposes constraints search space This paper describes polymorphic GP system generate polymorphic programs programs take inputs one type produces outputs one type We also demonstrate operation generation map polymorphic program
Although socalled naive Bayesian classification makes unrealistic assumption values attributes example independent given class example learning method remarkably successful practice uniformly better learning method known Boosting general method combining multiple classifiers due Yoav Freund Rob Schapire This paper shows boosting applied naive Bayesian classifiers yields combination classifiers representationally equivalent standard feedforward multilayer perceptrons An ancillary result naive Bayesian classification nonparametric nonlinear generalization logistic regression As training algorithm boosted naive Bayesian learning quite different backpropagation definite advantages Boosting requires linear time constant space hidden nodes learned incrementally starting important On realworld datasets method tried far generalization performance good better best published result using learning method Unlike standard learning algorithms naive Bayesian learning without boosting done logarithmic time linear number parallel computing units Accordingly learning methods highly plausible computationally models animal learning Other arguments suggest plausible behaviorally also
This paper addresses problem handling skewed class distributions within casebased learning CBL framework We first present baseline informationgainweighted CBL algorithm apply three data sets natural language processing NLP skewed class distributions Although overall performance baseline CBL algorithm good show algorithm exhibits poor performance minority class instances We present two CBL algorithms designed improve performance minority class predictions Each variation creates testcasespecific feature weights first observing path taken test case decision tree created learning task using pathspecific information gain values create appropriate weight vector use case retrieval When applied NLP data sets algorithms shown significantly increase accuracy minority class predictions maintaining improving overall classification accuracy
We introduce class iteratively decodable trellisconstrained codes generalization turbocodes lowdensity paritycheck codes seriallyconcatenated convolutional codes product codes In trellisconstrained code multiple trellises interact define allowed set codewords As result interactions minimumcomplexity single trellis code state space grows exponentially block length However turbocodes lowdensity paritycheck codes decoder approximate bitwise maximum posteriori decoding using sumproduct algorithm factor graph describes code We present two new families codes homogenous trellisconstrained codes ringconnected trellisconstrained codes give results show codes perform regime turbocodes lowdensity paritycheck codes
In paper present research identifying modeling strategies human tutors use integrating previous explanations current explanations We used work develop computational model partially implemented explanation facility existing tutoring system known SHERLOCK We implementing system uses casebased reasoning identify previous situations explanations could potentially affect explanation constructed We identified heuristics constructing explanations exploit information ways similar observed When human tutors engage dialogue freely exploit aspects mutually known context including previous discourse Utterances draw previous discourse seem awkward unnatural even incoherent Previous discourse must taken account order relate new information effectively recently conveyed material avoid repeating old material would distract student new Thus strategies using dialogue history generating explanations great importance research natural language generation tutorial applications The goal work produce computational model effects discourse context explanations instructional dialogues implement model intelligent tutoring system maintains dialogue history uses planning explanations Based study humanhuman instructional dialogues developed taxonomy classifies types contextual effects occur data according explanatory functions serve In paper focus one important category taxonomy situations tutor explicitly refers previous explanation order point similarities differences material currently explained material presented earlier explanations We implementing system uses casebased reasoning identify previous situations explanations could potentially affect explanation constructed We identified heuristics constructing explanations exploit information ways similar observed instructional dialogues produced human tutors By building computer system capability optional facility enabled disabled able systematically evaluate hypothesis useful tutoring strategy In order test hypotheses effects previous discourse explanations building explanation component existing intelligent training system Sherlock Sherlock intelligent coached practice environment training avionics technicians troubleshoot complex electronic equipment Using Sherlock trainees solve problems minimal tutor interaction review troubleshooting behavior postproblem reflective followup session rfu tutor instructional dialogues produced human tutors
The RTRL algorithm fully recurrent continually running networks Robinson Fallside Williams Zipser requires On computations per time step n number noninput units I describe method suited online learning computes exactly gradient requires fixedsize storage order average time complexity per time step On
In many Markov chain Monte Carlo problems target density function known normalization constant In paper take advantage knowledge facilitate convergence diagnostic Markov sampler estimating L error kernel estimator Firstly propose estimator normalization constant shown asymptotically normal mixing moment conditions Secondly L error kernel estimator estimated using normalization constant estimator ratio estimated L error true L error shown converge probability similar conditions Thirdly propose sequential plot estimated L error tool monitor convergence Markov sampler Finally dimensional bimodal example given illustrate proposal fl Bin Yu Assistant Professor Department Statistics University California Berkeley CA Research supported part Junior Faculty Research Grant University California Berkeley grants DAALG DAAHG Army Research Office grant DMS National Science Foundation The author grateful Professors Peter Bickel Andrew Gelman many helpful discussions comments draft Special thanks due Mr Sam Buttrey help simulation Professor Per Mykland Mr Karl Broman commenting draft two anonymous two Markov samplers compared example using proposed diagnostic plot
In recent years number different semantics defaults proposed preferential structures semantics possibilistic structures rankings shown characterized set axioms known KLM properties Kraus Lehmann Magidor While viewed surprise show almost inevitable We giving yet another semantics defaults uses plausibility measures new approach modeling uncertainty generalize approaches probability measures belief functions possibility measures We show earlier approaches default reasoning embedded framework plausibility We provide necessary sufficient condition plausibilities KLM properties sound additional condition necessary sufficient KLM properties complete These conditions easily seen hold earlier approaches thus explaining characterized KLM properties
It commonplace artificial intelligence divide agents explicit beliefs two parts beliefs explicitly represented manifest memory implicitly represented constructive beliefs repeatedly reconstructed needed rather memorized Many theories knowledge view relation manifest constructive beliefs logical relation manifest beliefs representing constructive beliefs logic belief This view however limits ability theory treat incomplete inconsistent sets beliefs useful ways We argue illuminating view belief result rational representation In theory agent obtains constructive beliefs using manifest beliefs preferences rationally sense decision theory choose useful conclusions indicated manifest beliefs
The economic theory rationality promises equal mathematical logic importance mechanization reasoning We survey growing literature basic notions probability utility rational choice coupled practical limitations information resources influence design analysis reasoning representation systems
In paper propose efficient reliable shotgun sequence assembly algorithm based fingerprinting scheme robust noise repetitive sequences data Our algorithm uses exact matches short patterns randomly selected fragment data identify fragment overlaps construct overlap map finally deliver consensus sequence We show statistical clues made explicit approach easily exploited correctly assemble results even presence extensive repetitive sequences Our approach exceptionally fast practice eg successfully assembled whole Mycoplasma genitalium genome approximately kbps roughly minutes MB MHz Pentium Pro CPU time real shotgun data existing algorithms expected run several hours day data Moreover experiments shotgun data synthetically prepared real DNA sequences wide range organisms including human DNA containing extensive repeating regions demonstrate algorithms robustness noise presence repetitive sequences For example correctly assembled kbp Human DNA sequence less minutes MB MHz Pentium Pro CPU time fl Support research provided part Office Naval Research grant N
At Electronics Lab Swiss Federal Institute Techology ETH Zurich high performance Parallel Supercomputer MUSIC MUlti processor System Intelligent Communication beed developed As applications neural network simulation molecular dynamics show Electronics Lab Supercomputer absolutely par conventional supercomputers electric power requirements reduced factor wight reduced factor price reduced factor Software development key using parallel system This report focus programming environment MUSIC system applications
The paper concerned integration constraint logic programming systems CLP systems based genetic algorithms GA The resulting framework tailored applications require first phase number constraints need generated second phase optimal solution satisfying constraints produced The first phase carried CLP second one GA We present specific framework ECL PS e ECRC Common Logic Programming System GENOCOP GEnetic algorithm Numerical Optimization COnstrained Problems integrated framework called CoCo COmputational intelligence plus COnstraint logic programming The CoCo system applied training problem neural networks We consider constrained networks eg neural networks shared weights constraints weights example domain constraints hardware implementation etc Then ECL PS e used generate chromosome representation together constraints ensure cases network specified exactly one chromosome Thus problem becomes constrained optimization problem optimization criterion optimize error network GENOCOP used find optimal solution Note The work second author partially supported SION department NWO National Foundation Scientific Research This work carried third author visiting CWI Amsterdam fourth author visiting Leiden University
In paper study global optimization methods like genetic algorithms used train neural networks We introduce notion regularity studying properties error function expand search space artificial way Regularities used generate constraints weights network In order find satisfiable set constraints use constraint logic programming system Then training network becomes constrained optimization problem We also relate notion regularity socalled network transformations
In paper study PAClearning algorithms specialized classes deterministic finite automata DFA In particular study branching programs investigate influence width branching program difficulty learning problem We first present distributionfree algorithm learning width branching programs We also give algorithm proper learning width branching programs uniform distribution labeled samples We show existence efficient algorithm learning width branching programs would imply existence efficient algorithm learning DNF known case Finally show existence algorithm learning width branching programs would also yield algorithm learning restricted version parity noise
The Longest common subsequence problem examined point view parameterized computational complexity There several different ways parameters enter problem number sequences analyzed length common subsequence size alphabet Lower bounds complexity basic problem imply lower bounds number sequence alignment consensus problems At issue theory parameterized complexity whether problem takes input x k solved time f k n ff ff independent k termed fixedparameter tractability It argued appropriate asymptotic model feasible computability problems small range parameter values covers important applications situation certainly holds many problems biological sequence analysis Our main results show The Longest Common Subsequence LCS parameterized number sequences analyzed hard W The LCS problem problem parameterized length common subsequence belongs W P hard W The LCS problem parameterized number sequences length common subsequence complete W All results obtained unrestricted alphabet sizes For alphabets fixed size problems fixedparameter tractable We conjecture remains hard
Is universe computable If may much cheaper terms information requirements compute computable universes instead I apply basic concepts Kolmogorov complexity theory set possible universes chat perceived true randomness life generalization learning given universe Assumptions A long time ago Great Programmer wrote program runs possible universes His Big Computer Possible means computable Each universe evolves discrete time scale Any universes state given time describable finite number bits One many universes despite evolved claim incomputable Computable universes Let T M denote arbitrary universal Turing machine unidirectional output tape T M input output symbols comma T M possible input programs ordered alphabetically empty program etc Let A k denote T M kth program list Its output finite infinite string alphabet f g This sequence bitstrings separated commas interpreted evolution E k universe U k If E k includes least one comma let U l k represents U k state lth time step E k k l f g E k represented sequence U k corresponds U k big bang Different algorithms may compute universe Some universes finite whose programs cease producing outputs point others I dont know TM important The choice Turing machine important This due compiler theorem universal Turing machine C exists constant prefix C f g fl possible programs p Cs output response program C p identical T M output response p The prefix C compiler compiles programs T M equivalent programs C k denote lth possibly empty bitstring lth comma U l
The MetropolisHastings algorithm estimating distribution based choosing candidate Markov chain accepting rejecting moves candidate produce chain known invariant measure The traditional methods use candidates essentially unconnected Based diffusions invariant develop onedimensional distributions class candidate distributions selftarget towards high density areas These produce MetropolisHastings algorithms convergence rates appear considerably better known traditional candidate choices random walk In particular wide classes choices may effectively help reduce burnin problem We illustrate behaviour examples exponential polynomial tails logistic regression model using Gibbs sampling algorithm
Copyright IEEE Published Proceedings Micro December Research Triangle Park North Carolina Personal use material permitted However permission reprintrepublish material advertising promotional purposes creating new collective works resale redistribution servers lists reuse copyrighted component work works must obtained IEEE Contact Manager Copyrights Permissions IEEE Service Center Hoes Lane PO Box Piscataway NJ USA Telephone Intl
The application machine learning ML solve practical problems complex Only recently due increased promise ML solving real problems experienced difficulty use issue started attract attention This difficulty arises complexity learning problems large variety available techniques In order understand complexity begin overcome important construct characterization learning situations Building previous work dealt practical use ML set dimensions developed contrasted another recent proposal illustrated project development decisionsupport system marine propeller design The general research opportunities emerge development dimensions discussed Leading toward working systems simple model presented setting priorities research selecting learning tasks within large projects Central development concepts discussed paper use future projects recording successes limitations failures
We show DNF terms size approximated function Od log non zero Fourier coefficients expected error squared respect uniform distribution This property used derive learning algorithm DNF uniform distribution The learning algorithm uses queries learns respect uniform distribution DNF terms size time polynomial n Od log The interesting implications case constant In case algorithm learns DNF polynomial number terms time n Olog log n DNF terms size Olog n log log n polynomial time
COINS Technical Report December Abstract Multivariate decision trees overcome representational limitation univariate decision trees univariate decision trees restricted splits instance space orthogonal features axis This paper discusses following issues constructing multivariate decision trees representing multivariate test including symbolic numeric features learning coefficients multivariate test selecting features include test pruning multivariate decision trees We present new review wellknown methods forming multivariate decision trees The methods compared across variety learning tasks assess methods ability find concise accurate decision trees The results demonstrate multivariate methods effective others In addition experiments confirm allowing multivariate tests improves accuracy resulting decision tree univariate trees
Technical Report No April University Washington Department Statistics Seattle Washington Abstract Kooperberg Bose Stone introduced polyclass methodology uses adaptively selected linear splines tensor products model conditional class probabilities The authors attempted develop methodology would work well small moderate size problems would scale large problems However version polyclass developed large problems impractical required two months cpu time apply large data set A modification methodology involving use stochastic gradient online method fitting polyclass models given sets basis functions developed makes methodology applicable large data sets In particular successfully applied phoneme recognition problem involving phonemes features cases training sample basis functions unknown parameters Comparisons neural networks made original problem threevowel subproblem
The use externally imposed hierarchical structures reduce complexity learning control common However acknowledged learning hierarchical structure important step towards general learning many things required less bounded learning single thing specified learning Presented paper reinforcement learning algorithm called Nested Qlearning generates hierarchical control structure reinforcement learning domains The emergent structure combined learned bottomup reactive reactions results reactive hierarchical control system Effectively learned hierarchy decomposes would otherwise monolithic evaluation function many smaller evaluation functions recombined without loss previously learned information
We present online investment algorithm achieves almost wealth best constantrebalanced portfolio determined hindsight actual market outcomes The algorithm employs multiplicative update rule derived using framework introduced Kivinen Warmuth Our algorithm simple implement requires constant storage computing time per stock trading period We tested performance algorithm real stock data New York Stock Exchange accumulated year period On data algorithm clearly outperforms best single stock well Covers universal portfolio selection algorithm We also present results situation investor access additional side information
We consider belief revision operators satisfy AlchourronGardenforsMakinson postulates present epistemic logic revision operator result revision described sentence logic In logic fact agents set beliefs represented sentence O O Levesques know operator Intuitively O read believed The fact agent believes represented sentence B read usual way believed The connective represents update defined Katsuno Mendelzon The revised beliefs represented sentence O B We show every revision operator satisfies AGM postulates model epistemic logic beliefs implied sentence O B model correspond exactly sentences implied theory results revising This means reasoning changes agents beliefs reduces model checking certain epistemic sentences The negative result paper type formal account revision extended situation agent able reason beliefs A fully introspective agent use construction reason results revisions pain triviality
This paper introduces novel enhancement learning Bayesian networks bias small highpredictiveaccuracy networks The new approach selects subset features maximizes predictive accuracy prior network learning phase We examine explicitly effects two aspects algorithm feature selection node ordering Our approach generates networks computationally simpler evaluate display predictive accuracy comparable Bayesian networks model attributes
While need hierarchies within control systems apparent also clear many researchers hierarchies learned Learning structure component behaviors difficult task The benefit learning hierarchical structures behaviors decomposition control structure smaller transportable chunks allows previously learned knowledge applied new related tasks Presented paper improvements Nested Qlearning NQL allow realistic learning control hierarchies reinforcement environments Also presented simulation simple robot performing series related tasks used compare hierarchical nonhierarchal learning techniques
Recent findings suggest classification scheme based ensemble networks effective way address overfitting We study optimal methods training ensemble networks Some recent experiments Postal Zipcode character data suggest weight decay may optimal method controlling variance classifier
Bestfirst model merging general technique dynamically choosing structure neural related architecture avoiding overfitting It applicable learning recognition tasks often generalizes significantly better fixed structures We demonstrate approach applied tasks choosing radial basis functions function learning choosing local affine models curve constraint surface modelling choosing structure balltree bumptree maximize efficiency access
We describe algorithms estimating given measure known constant proportionality based large class diffusions extending Langevin model invariant We show weak conditions one choose class way diffusions converge exponential rate one even ensure convergence independent starting point algorithm When convergence less exponential show often polynomial known rates We consider methods discretizing diffusion time find methods inherit convergence rates continuous time process These contrast behaviour naive Euler discretization behave badly even simple cases
Linsker reported development structured receptive fields simulations using Hebbtype synaptic plasticity rule feedforward linear network The synapses develop dynamics determined matrix closely related covariance matrix input cell activities We analyse dynamics learning rule terms eigenvectors matrix These eigenvectors represent independently evolving weight structures Some general theorems presented regarding properties eigenvectors eigenvalues For general covariance matrix four principal parameter regimes predicted We concentrate gaussian covariances layer B C Linskers network Analytic numerical solutions eigenvectors layer presented Three eigenvectors dominate dynamics DC eigenvector synapses sign bilobed oriented eigenvector circularly symmetric centresurround eigenvector Analysis circumstances vectors dominates yields explanation emergence centresurround structures symmetrybreaking bilobed structures Criteria developed estimating boundary parameter regime centresurround structures emerge The application analysis Linskers higher layers covariance functions oscillatory briefly discussed
This paper considers problem scaling proposal distribution multidimensional random walk Metropolis algorithm order maximize efficiency algorithm The main result weak convergence result dimension sequence target densities n converges When proposal variance appropriately scaled according n sequence stochastic processes formed first component Markov chain converge appropriate limiting Langevin diffusion process The limiting diffusion approximation admits straightforward efficiency maximization problem resulting asymptotically optimal policy related asymptotic acceptance rate proposed moves algorithm The asymptotically optimal acceptance rate quite general conditions The main result proved case target density symmetric product form Extensions result discussed
Combinating reactivity planning proposed means compensating potentially slow response times planners still making progress toward long term goals The demands rapid response complexity many environments make difficult decompose tune coordinate reactive behaviors ensuring consistency Neural networks address tuning problem less useful decomposition coordination We hypothesize interacting reactions decomposed separate behaviors resident separate networks interaction coordinated tuning mechanism higher level controller To explore issues implemented neural network architecture reactive component two layer control system simulated race car By varying architecture test whether decomposing reactivity separate behaviors leads superior overall performance coordination learning convergence
We introduce formal model teaching teacher tailored particular learner yet teaching protocol designed collusion possible Not surprisingly model remedies nonintuitive aspects models teacher must successfully teach consistent learner We prove class exactly identified deterministic polynomialtime algorithm access rich set examplebased queries teachable computationally unbounded teacher polynomialtime learner In addition present general results relating model teaching various previous results We also consider problem designing teacherlearner pairs teacher learner polynomialtime algorithms describe teacherlearner pairs classes decision lists Horn sentences
This paper explores algorithms automatic quantization realvalued datasets using thermometer codes pattern classification applications Experimental results indicate relatively simple randomized thermometer code generation technique result quantized datasets used train simple perceptrons yield generalization test data substantially better obtained unquantized counterparts
Automated search space candidate designs seems attractive way improve traditional engineering design process To make approach work however automated design system must include knowledge modeling limitations method used evaluate candidate designs also effective way use knowledge influence search process We suggest productive approach include knowledge implementing set model constraint functions measure much modeling assumptions violated influence search using values model constraint functions constraint inputs standard constrained nonlinear optimization numerical method We test idea domain conceptual design supersonic transport aircraft experiments indicate model constraint communication strategy decrease cost design space search one orders magnitude
The recognition objects hence descriptions must grounded environment terms sensor data We argue concepts used classify perceived objects used perform actions objects integrate actionoriented perceptual features perceptionoriented action features We present grounded symbolic representation concepts Moreover concepts learned We show logicoriented approach learning grounded concepts
The curse dimensionality one severest problems concerning application RBF networks The number RBF nodes therefore number training examples needed grows exponentially intrinsic dimensionality input space One way address problem application feature selection data preprocessing step In paper propose twostep approach determination optimal feature subset First possible featuresubsets reduced best discrimination properties application fast robust filter technique EUBAFES Secondly use wrapper approach judge preselected feature subsets leads RBF networks least complexity best classification accuracy Experiments undertaken show improvement RBF networks feature selection approach
This paper reexamines problem parameter estimation Bayesian networks missing values hidden variables perspective recent work online learning We provide unified framework parameter estimation encompasses online learning model continuously adapted new data cases arrive traditional batch learning preaccumulated set samples used onetime model selection process In batch case framework encompasses gradient projection algorithm EM algorithm Bayesian networks The framework also leads new online batch parameter update schemes including parameterized version EM We provide empirical theoretical results indicating parameterized EM allows faster convergence maximum likelihood parame ters standard EM
Many techniques speedup learning knowledge compilation focus learning optimization macrooperators control rules task domains characterized using problemspace search paradigm However characterization fit well class task domains problem solver required perform continuous manner For example many robotic domains problem solver required monitor realvalued perceptual inputs vary motor control parameters continuous online manner successfully accomplish task In domains discrete symbolic states operators difficult define To improve performance continuous problem domains problem solver must learn modify use continuous operators continuously map input sensory information appropriate control outputs Additionally problem solver must learn contexts continuous operators applicable We propose learning method compile sensorimotor experiences continuous operators used improve performance problem solver The method speeds task performance well results improvements quality resulting solutions The method implemented robotic navigation system evaluated extensive experimen tation
Discussions casebased reasoning often reflect implicit assumption case memory system become better informed ie increase knowledge cases added casebase This paper considers formalisations knowledge content necessary preliminary rigourous analysis performance casebased reasoning systems In particular interested modelling learning aspects casebased reasoning order study performance casebased reasoning system changes accumlates problemsolving experience The current paper presents casebase semantics generalises recent formalisations casebased classification Within framework paper explores various issues assuring sematics welldefined illustrates knowledge content case memory system seen reside chosen similarity measure cases casebase
This paper proposes four performance measures genetic algorithm GA enable us compare different GAs op timization problem different choices parameters values The performance measures defined terms observations simulation frequency optimal solutions fitness values frequency evolution leaps number generations needed reach optimal solution We present case study parameters GA robot path planning tuned performance optimized performance evaluation using measures Especially one performance measures used demonstrate adaptivity GA robot path planning We also propose process systematic tuning based techniques design experiments
We propose analyze distribution learning algorithm subclass Acyclic Probabilistic Finite Automata APFA This subclass characterized certain distinguishability property automatas states Though hardness results known learning distributions generated general APFAs prove algorithm indeed efficiently learn distributions generated subclass APFAs consider In particular show KLdivergence distribution generated target source distribution generated hypothesis made small high confidence polynomial time We present two applications algorithm In first show model cursively written letters The resulting models part complete cursive handwriting recognition system In second application demonstrate APFAs used build multiplepronunciation models spoken words We evaluate APFA based pronunciation models labeled speech data The good performance terms loglikelihood obtained test data achieved APFAs incredibly small amount time needed learning suggests learning algorithm APFAs might powerful alternative commonly used probabilistic models
This paper examines inductive inference complex grammar neural networks specifically task considered training network classify natural language sentences grammatical ungrammatical thereby exhibiting kind discriminatory power provided Principles Parameters linguistic framework GovernmentandBinding theory Neural networks trained without division learned vs innate components assumed Chomsky attempt produce judgments native speakers sharply grammaticalungrammatical data How recurrent neural network could possess linguistic capability properties various common recurrent neural network architectures discussed The problem exhibits training behavior often present smaller grammars training initially difficult However implementing several techniques aimed improving convergence gradient descent backpropagationthrough time training algorithm significant learning possible It found certain architectures better able learn appropriate grammar The operation networks training analyzed Finally extraction rules form deterministic finite state automata investigated
We propose lazy neural tree LNT appropriate architecture realization smooth regression systems The LNT hybrid decision tree neural network From neural network inherits smoothness generated function incremental adaptability conceptual simplicity From decision tree inherits topology initial parameter setting well efficient sequential implementation outperforms traditional neural network simulations order magnitudes The enormous speed achieved lazy evaluation A speedup obtained application windowing scheme region interesting results restricted
To read handwritten digit string helpful segment image separate digits Bottomup segmentation heuristics often fail neighboring digits overlap substantially We describe system stochastic generative model digit class show knowledge required segmentation The system uses Gibbs sampling construct perceptual interpretation digit string segmentation arises naturally explaining away effects occur Bayesian inference By using conditional mixtures factor analyzers possible extract explicit compact representation instantiation parameters describe pose digit These instantiation parameters used inputs higher level system models relationships digits The technique could used model individual digits redundancies instantiation parameters parts
This paper describes new method determining consensus sequences signal start translation boundaries exons introns donor acceptor sites eukaryotic mRNA The method takes account dependencies adjacent bases contrast usual technique considering position independently When coupled dynamic program compute likely sequence new consensus sequences emerge The consensus sequence information summarized conditional probability matrices used locate signals uncharacterized genomic DNA greater sensitivity specificity conventional matrices Speciesspecific versions matrices especially effective distinguishing true false sites
In pantheon evolutionary forces optimizing Apollonian powers natural selection generally assumed dominate dark Dionysian dynamics sexual selection But need case particularly class selective mating mechanisms called directional mate preferences Kirkpatrick In previous simulation research showed nondirectional assortative mating preferences could cause populations spontaneously split apart separate species Todd Miller In paper show directional mate preferences cause populations wander capriciously phenotype space strange form runaway sexual selection without influence natural selection pressures When directional mate preferences free evolve always evolve point direction naturalselective peaks Sexual selection thus take life mate preferences within species become distinct important part environment species phenotypes adapt These results suggest broader conception adaptive behavior attracting potential mates becomes important finding food avoiding predators We present framework simulating wide range directional nondirectional mate preferences discuss practical scientific applications simu lating sexual selection
Case based reasoning CBR uses knowledge former experiences known cases Since special knowledge expert mainly subject experiences CBR techniques good base development expert systems We investigate problem technical diagnosis Diagnosis considered classification task process guided computer assisted experience This corresponds flexible case completion approach Flexibility also needed expert view predominant interest unexpected unpredictible cases
The paper investigates possibilities using simple recurrent networks transducers map sequential natural language input nonsequential featurebased semantics The networks perform well sentences containing single main predicate encoded transitive verbs prepositions applied multiplefeature objects encoded nounphrases adjectival modifiers shows robustness ungrammatical inputs A second set experiments deals sentences containing embedded structures Here network able process multiple levels sentencefinal embeddings one level centerembedding This turns consequence networks inability retain information reflected outputs intermediate phases processing Two extensions Elmans original recurrent network architecture introduced
We use connectionist network trained reinforcement control autonomous robot vehicle simulated robot We show given appropriate sensory data architectural structure network learn control robot simple navigation problem We investigate complex goalbased problem examine planlike behavior emerges An autonomous agent abstractly defined mapping sequence sensory inputs appropriate action response percepts Such agent autonomous extent behavior determined immediate inputs past experience rather builtin control Russell Wefald We interested investigating cognitive capabilities autonomous agents We believe cognitive behavior emerge reactive situated activity autonomous agents Some consensus exists design autonomous agents There relatively direct coupling perception action control distributed decentralized importantly dynamic interaction environment agent Maes However consensus exists best method implement design features Connectionist networks easily accommodate design features believe effective mechanisms controlling autonomous agents This paper focuses exploring connectionist designs controlling simple navigation autonomous vehicle We conclude applying successful design features difficult problem examine planlike behavior emerges We examine implementation questions testing network controllers real simulated robot simple environment using experimental variables type sensory data type training subtasks amount memory To train network controllers use reinforcement learning algorithm converts abstract measures goodness reward punishment specific teacher signals The environment called playpen rectangular box fi feet light one corner The reinforcement training problems investigate involve coordination motor activity information supplied sensors The navigation problem refer avoid move reinforces robot moving playpen avoiding walls The difficult problem light food adds goal state order simulate hunger reinforces robot periodically seeking avoiding light still avoiding walls moving By holding problem constant varying sensory ability determine extent addition sensors helps robot succeed problem Similarly evaluate utility contextual memory different training subtasks autoassociation prediction Our robot called carbot modified toy car fi inches controlled programmable miniboard designed Martin Carbot inexpensive build primarily makes use primitive sensorsno lasers video sonar It two servomotors one controls forward backward motion steering The robot two types physical sensors digital touch sensors front back bumpers analog light sensors stalks near back The light sensors directed degrees side carbot Carbot controlled remote connectionist network communicates miniboard The network gathers input data sensors determines set motors next time step Figure shows standard network used experiments There four discrete sets input units three sensors one context memory The first set represents previous state two motors two units per motor The first motor unit represents spin direction rear motor determines direction motionforward backward The second unit designates state motor The third unit represents spin direction front motor determines direction turningleft right Note order turn carbot must motors
In paper consider problem tracking subset domain called target changes gradually time A single unknown probability distribution domain used generate random examples learning algorithm measure speed target changes Clearly rapidly target moves harder algorithm maintain good approximation target Therefore evaluate algorithms based much movement target tolerated examples predicting accuracy Furthermore complexity class H possible targets measured VCdimension also effects difficulty tracking target concept We show problem minimizing number disagreements sample among concepts class H approximated within factor k simple tracking algorithm H achieve probability making mistake target movement rate constant times kd k ln VapnikChervonenkis dimension H Also show H properly PAClearnable efficient randomized algorithm high probability approximately minimizes disagreements within factor yielding efficient tracking algorithm H tolerates drift rates constant times ln In addition prove complementary results classes halfspaces axisaligned hy perrectangles showing maximum rate drift algorithm even unlimited computational power tolerate constant times
Practical pattern classification knowledge discovery problems require selection subset attributes features much larger set represent patterns classified This due fact performance classifier usually induced learning algorithm cost classification sensitive choice features used construct classifier Exhaustive evaluation possible feature subsets usually infeasible practice large amount computational effort required Genetic algorithms belong class randomized heuristic search techniques offer attractive approach find nearoptimal solutions optimization problems This paper presents approach feature subset selection using genetic algorithm Some advantages approach include ability accommodate multiple criteria accuracy cost classification feature selection process find feature subsets perform well particular choices inductive learning algorithm used construct pattern classifier Our experiments several benchmark realworld pattern classification problems demonstrate feasibility approach feature subset selection automated Many practical pattern classification tasks eg medical diagnosis require learning appropriate classification function assigns given input pattern typically represented using vector attribute feature values one finite set classes The choice features attributes measurements used represent patterns presented classifier affect among things The accuracy classification function learned using inductive learning algorithm eg decision tree induction algorithm neural network learning algorithm The features used describe patterns implicitly define pattern language If language expressive enough would fail capture information necessary classification hence regardless learning algorithm used accuracy classification function learned would limited lack information design neural networks pattern classification knowledge discovery
The main aim paper provide tutorial regression Gaussian processes We start Bayesian linear regression show change viewpoint one see method Gaussian process predictor based priors functions rather priors parameters This leads general discussion Gaussian processes section Section deals issues including hierarchical modelling setting parameters control Gaussian process covariance functions neural network models use Gaussian processes classification problems
General convergence results linear discriminant updates Abstract The problem learning linear discriminant concepts solved various mistakedriven update procedures including Winnow family algorithms wellknown Perceptron algorithm In paper define general class quasiadditive algorithms includes Perceptron Winnow special cases We give single proof convergence covers much class including Perceptron Winnow also many novel algorithms Our proof introduces generic measure progress seems capture much algorithms converge Using measure develop simple general technique proving mistake bounds apply new algorithms well existing algorithms When applied known algorithms technique automatically produces close variants existing proofs generally obtain known bounds within constants thus showing certain sense seem ingly diverse results fundamentally isomorphic
In paper present prototype flexible similaritybased retrieval system Its flexibility supported allowing imprecisely specified query Moreover algorithm allows assessing retrieved items relevant initial context specified query The presented system used supporting tool software repository We also discuss system evaluation concerns usefulness scalability applicability comparability Evaluation T A system three domains gives us encouraging results integration TA real software repository retrieval tool ongoing
This paper describes application inductive learning techniques casebased reasoning We introduce two main forms induction define casebased reasoning present combination The evaluation proposed system called TA carried classification task namely character recognition We show inductive knowledge improves knowledge representation turn flexibility system performance terms classification accuracy scalability
The AAAI Fall Symposium Flexible Computation Intelligent Systems Results Issues Opportunities Nov Cambridge MA Abstract This paper presents casebased reasoning system TA We address flexibility casebased reasoning process namely flexible retrieval relevant experiences using novel similarity assessment theory To exemplify advantages approach experimentally evaluated system compared performance performance nonflexible version TA machine learning algorithms several domains
Diagnosis disease treatment separate oneshot activities Instead often dependent interleaved time mostly due uncertainty underlying disease uncertainty associated response patient treatment varying cost different treatment diagnostic investigative procedures The framework particularly suitable modeling complex therapy decision process Partially observable Markov decision process POMDP Unfortunately problem finding optimal therapy within standard POMDP framework also computationally costly In paper investigate various structural extensions standard POMDP framework approximation methods allow us simplify model construction process larger therapy problems solve faster A therapy problem target specifically management patients ischemic heart disease
Consider group Bayesians subjective probability distribution set uncertain events An opinion pool derives single consensus distribution events representative group whole Several pooling functions proposed sensible particular assumptions measures Many researchers many years failed form consensus method best We propose marketbased pooling procedure analyze properties Participants bet securities paying contingent uncertain event maximize expected utilities The consensus probability event defined corresponding securitys equilibrium price The market framework provides explicit monetary incentives participation honesty allows agents maintain individual rationality limited privacy No arbitrage arguments ensure equilibrium prices form legal probabilities We show events disjoint participants exponential utility money market derives result logarithmic opinion pool similarly logarithmic utility money yields linear opinion pool In cases prove groups behavior outside observer indistinguishable rational individual whose beliefs equal equilibrium prices
Genetic Programming increasing popularity basis wide range learning algorithms However technique date successfully applied modest tasks performance overheads evolving large number data structures many correspond valid program We address problem directly demonstrate evolutionary process achieved much greater efficiency use formallybased representation strong typing We report initial experimental results demonstrate technique exhibits significantly better performance previous work
The DNA promoter sequences domain theory database become popular testing systems integrate empirical analytical learning This note reports simple change reinterpretation domain theory terms MofN concepts involving learning results accuracy items database Moreover exhaustive search space MofN domain theory interpretations indicates expected accuracy randomly chosen interpretation maximum accuracy achieved cases This demonstrates informativeness domain theory without complications understanding interactions various learning algorithms theory In addition results help characterize difficulty learning using DNA promoters theory
We discuss strategy polychotomous classification involves estimating class probabilities pair classes coupling estimates together The coupling model similar BradleyTerry method paired comparisons We study nature class probability estimates arise examine performance procedure real simulated datasets Classifiers used include linear discriminants nearest neighbors support vector machine
Intracortical microstimulation ICMS single site somatosensory cortex rats monkeys hours produces large increase number neurons responsive skin region corresponding ICMSsite receptive field RF little effect position size ICMSsite RF response evoked ICMS site tactile stimulation Recanzone et al b Large changes RF topography observed following several weeks repetitive stimulation restricted skin region monkeys Jenkins et al Recanzone et al acde Repetitive stimulation localized skin region monkeys produced training monkeys tactile frequency discrimination task improves performance Recanzone et al It suggested changes RF topography caused competitive learning excitatory pathways Grajski Merzenich Jenkins et al Recanzone et al abcde ICMS almost simultaneously excites excitatory inhibitory terminals excitatory inhibitory cortical neurons within microns stimulating electrode Thus paper investigates implications possibility lateral inhibitory pathways may undergo synaptic plasticity ICMS Lateral inhibitory pathways may also undergo synaptic plasticity adult animals peripheral conditioning The EXIN afferent excitatory lateral inhibitory synaptic plasticity rules
In work present classification methodology LINNEO discover concepts illstructured domains organize hierarchies In order achieve aim LINNEO uses conceptual learning techniques classification The final target build knowledge bases expert validation Some techniques improvement results classification step used like biasing using partial expert knowledge classification rules causal structural dependencies attributes delayed cluster assignation objects Also comparisons wellknown systems shown
Technical Report January Abstract The new classification algorithm CLEF combines version linear machine known machine nonlinear function approximator constructs features The algorithm finds nonlinear decision boundaries constructing features needed learn necessary discriminant functions The CLEF algorithm proven separate consistently labelled training instances even linearly separable input variables The algorithm illustrated variety tasks
In paper review five heuristic strategies handling contextsensitive features supervised machine learning examples We discuss two methods recovering lost implicit contextual information We mention evidence hybrid strategies synergetic effect We show work several machine learning researchers fits framework While claim strategies exhaust possibilities appears framework includes techniques found published literature contextsensitive learning
In paper describe new adaptive penalty approach handling constraints genetic algorithm optimization problems The idea start relatively small penalty coefficient increase decrease demand optimization progresses Empirical results several engineering design domains demonstrate merit proposed approach
Recent research decision theoretic planning focussed making solution Markov decision processes MDPs feasible We develop family algorithms structured reachability analysis MDPs suitable initial state set states known Using compact structured representations MDPs eg Bayesian networks methods vary tradeoff complexity accuracy produce structured descriptions estimated reachable states used eliminate variables variable values problem description reducing size MDP making easier solve One contribution work extension ideas GRAPHPLAN deal distributed nature action representations typically embodied within Bayes nets problem correlated action effects We also demonstrate algorithm made complete using kary constraints instead binary constraints Another contribution illustration compact representation reachability constraints exploited several existing exact approximate abstraction algorithms MDPs
Many ILP systems GOLEM FOIL MIS take advantage user supplied metaknowledge restrict hypothesis space This metaknowledge form type information arguments predicate learned information whether certain argument predicate functionally dependent arguments supplied mode information This meta knowledge explicitly supplied ILP system addition data The present paper argues many cases meta knowledge extracted directly raw data Three algorithms presented learn type mode symmetric metaknowledge data These algorithms incorporated existing ILP systems form preprocessor obviates need user explicitly provide information In many cases algorithms extract meta knowledge user either unaware information used ILP system restrict hypothesis space
Gold showed even regular grammars exactly identified positive examples alone Since known children learn natural grammars almost exclusively positives examples Golds result used theoretical support Chomskys theory innate human linguistic abilities In paper new results presented show within Bayesian framework grammars also logic programs learnable arbitrarily low expected error positive examples In addition show upper bound expected error learner maximises Bayes posterior probability learning positive examples within small additive term one mixture positive negative examples An Inductive Logic Programming implementation described avoids pitfalls greedy search global optimisation function local construction individual clauses hypothesis Results testing implementation artificiallygenerated datasets reported These results agreement theoretical predictions
p n We prove two results estimator Smooth With high probability f fl n least smooth f wide variety smoothness measures Adapt The estimator comes nearly close mean square f measurable estimator come uniformly balls two broad scales smoothness classes These two properties unprecedented several ways Our proof results develops new facts abstract statistical inference connection Acknowledgements These results described Symposium Wavelet Theory held connection Shanks Lectures Vanderbilt University April The author would like thank Professor LL Schumaker hospitality conference RA DeVore Iain Johnstone Gerard Kerkyacharian Bradley Lucier AS Nemirovskii Ingram Olkin Dominique Picard interesting discussions correspondence related topics The author also University California Berkeley
In paper study sequence evolution general Markov model incorporates practically every stochastic model found literature In particular study error estimation evolutionary distances dependence sample sequence lengths By deriving large deviation results applicable distancebased evolutionary tree building algorithm show Harmonic Greedy Triplets Short Quartet algorithms recover evolutionary tree high probability sequences polynomial length number nodes
Results reported application tools synthesizing optimizing analyzing neural networks ECG Patient Monitoring task A neural network synthesized rulebased classifier optimized set normal abnormal heartbeats The classification error rate separate larger test set reduced factor Sensitivity analysis synthesized optimized networks revealed informative differences Analysis weights unit activations optimized network enabled reduction size network factor without loss accuracy
Intracortical microstimulation ICMS localized site somatosensory cortex rats monkeys hours produces large increase cortical representation skin region represented ICMSsite neurons ICMS little effect ICMSsite neurons RF location RF size responsiveness Recanzone et al The EXIN afferent excitatory lateral inhibitory learning rules Marshall used model RF changes ICMS The EXIN model produces reorganization RF topography similar observed experimentally The possible role inhibitory learning producing effects ICMS studied simulating EXIN model lateral inhibitory learning The model also produces increase cortical representation skin region represented ICMSsite RF ICMS compared artificial scotoma conditioning Pettet Gilbert retinal lesions DarianSmith Gilbert suggested lateral inhibitory learning may general principle cortical plasticity
We present detailed analysis evolution GP populations using problem finding program returns maximum possible value given terminal function set depth limit program tree known MAX problem We confirm basic message Gathercole Ross crossover together program size restrictions responsible premature convergence suboptimal solution We show happen even population retains high level variety show many cases evolution suboptimal solution solution possible sufficient time allowed In cases theoretical models presented compared actual runs Experimental evidence presented Prices Covariance Selection Theorem applied GP populations practical effect program size restrictions noted Finally show covariance gene frequency fitness first generations used predict course GP runs
We present symbolic machinery admits probabilistic causal information given domain produces probabilistic statements effect actions impact observations The calculus admits two types conditioning operators ordinary Bayes conditioning P yjX x represents observation X x causal conditioning P yjdoX x read probability Y conditioned holding X constant x deliberate action Given mixture observational causal sentences together topology causal graph calculus derives new conditional probabilities types thus enabling one quantify effects actions observations
A general model coevolution cooperating species presented This model instantiated tested domain function optimization compared traditional GAbased function optimizer The results encouraging two respects They suggest ways performance GA EAbased optimizers improved suggest new approach evolving complex structures neural networks rule sets
This paper investigates learning lifelong context Lifelong learning addresses situations learner faces whole stream learning tasks Such scenarios provide opportunity transfer knowledge across multiple learning tasks order generalize accurately less training data In paper several different approaches lifelong learning described applied object recognition domain It shown across board lifelong learning approaches generalize consistently accurately less training data ability transfer knowledge across learning tasks
A constant rebalanced portfolio investment strategy keeps distribution wealth among set stocks period period Recently work online investment strategies competitive best constant rebalanced portfolio determined hindsight Cover Helmbold et al Cover Ordentlich Cover Ordentlich b Ordentlich Cover Cover For universal algorithm Cover Cover provide simple analysis naturally extends case fixed percentage transaction cost commission answering question raised Cover Helmbold et al Cover Ordentlich Cover Ordentlich b Ordentlich Cover Cover In addition present simple randomized implementation significantly faster practice We conclude explaining algorithms applied problems combining predictions statistical language models resulting guarantees striking
LaiWan CHAN Evan FungYu YOUNG Computer Science Department The Chinese University Hong Kong New Territories Hong Kong Email lwchancscuhkhk Technical Report CSTR Abstract The fully connected recurrent network FRN using online training method Real Time Recurrent Learning RTRL computationally expensive It computational complexity ON storage complexity ON N number noninput units We devised locally connected recurrent model much lower complexity computational time storage space The ringstructure recurrent network RRN simplest kind locally connected corresponding complexity Omnnp Onp respectively p n number input hidden output units respectively We compare performance RRN FRN sequence recognition time series prediction We tested networks ability temporal memorizing power time warpping ability sequence recognition task In time series prediction task used networks train predict three series periodic series white noise deterministic chaotic series sunspots data Both tasks show RRN needs much shorter training time performance RRN comparable FRN
In object recognition systems interactions objects scene ignored best interpretation considered set hypothesized objects matches greatest number image features We show image interpretation cast problem finding probable explanation MPE Bayesian network models visual physical object interactions The problem determine exact conditional probabilities network shown unimportant since goal find probable configuration objects calculate absolute probabilities We furthermore show evaluating configurations feature counting equivalent calculating joint probability configuration using restricted Bayesian network derive assumptions probabilities necessary make Bayesian formulation reasonable
Research nonmonotonic default reasoning identified several important criteria preferring alternative default inferences The theories reasoning based criteria may uniformly viewed theories rational inference reasoner selects maximally preferred states belief Though researchers noted cases apparent conflict preferences supported different theories hoped special theories reasoning may combined universal logic nonmonotonic reasoning We show different categories preferences conflict realized adapt formal results social choice theory prove every universal theory default reasoning violate least one reasonable principle rational reasoning Our results interpreted demonstrating within preferential framework expect much improvement rigid lexicographic priority mechanisms proposed conflict resolution
We apply exponential weight algorithm introduced Littlestone Warmuth Vovk problem predicting binary sequence almost well best biased coin We first show case logarithmic loss derived algorithm equivalent Bayes algorithm Jeffreys prior studied Xie Barron probabilistic assumptions We derive uniform bound regret holds sequence We also show empirical distribution sequence bounded away length sequence increases infinity difference bound corresponding bound average case regret algorithm asymptotically optimal case We show gap necessary calculating regret minmax optimal algorithm problem showing asymptotic upper bound tight We also study application algorithm square loss show algorithm derived case different Bayes algorithm better prediction worstcase
We study close connections game theory online prediction boosting After brief review game theory describe algorithm learning play repeated games based online prediction methods Littlestone Warmuth The analysis algorithm yields simple proof von Neumanns famous minmax theorem well provable method approximately solving game We show online prediction model obtained applying gameplaying algorithm appropriate choice game boosting obtained applying algorithm dual game
When compiling instruction level parallelism ILP integration optimization phases lead improvement quality code generated However since several different representations program used various phases partial integration achieved date We present program representation combines resource requirements availability information control data dependence information The representation enables integration several optimizing phases including transformations register allocation instruction scheduling The basis integration simultaneous allocation different types resources We define representation show constructed We formulate several optimization phases use representation achieve better integration
Many problems impede design multiagent systems least passing information agents While others hand implement communication routes semantics explore method communication evolve In experiments described model agents connectionist networks We supply agent number communications channels implemented addition input output units channel The output units initiate environmental signals whose amplitude decay distance perturbed environmental noise An agent receive input individuals rather agents input reects summation agents output signals along channel Because use realvalued activations agents communicate using realvalued vectors Under evolutionary program GNARL agents coevolve communication scheme continuous channels conveys taskspe cific information
As field Genetic Programming GP matures breadth application increases need parallel implementations becomes absolutely necessary The transputerbased system presented Koza Andre one rare parallel implementations Until today implementation proposed parallel GP using SIMD architecture except dataparallel approach Tufts although others exploited workstation farms pipelined supercomputers One reason certainly apparent difficulty dealing parallel evaluation different Sexpressions single instruction executed time every processor The aim chapter present implementation parallel GP SIMD system processor efficiently evaluate different Sexpression We implemented approach MasPar MP computer present timing results To extent SIMD machines like MasPar available offer costeffective cycles scientific experimentation useful approach The idea simulating MIMD machine using SIMD architecture new Hillis Steele Littman Metcalf Dietz Cohen One original ideas Connection Machine Hillis Steele could simulate parallel architectures Indeed extreme processor SIMD architecture simulate universal Turing machine TM With different turing machine specifications stored local memory processor would simply tape tape head state table state pointer simulation would performed repeating basic TM operations simultaneously Of course simulation would inefficient difficult program would advantage really MIMD SIMD processor would idle state simulated machine halts Now let us consider alternative idea SIMD processor would simulate individual stored program computer using simple instruction set For step simulation SIMD system would sequentially execute possible instruction subset processors whose next instruction match For typical assembly language even reduced instruction set processors would idle time However set instructions implemented virtual processor small approach fruitful In case Genetic Programming instruction set composed specified set functions designed task We show precompilation step simply adding push conditional unconditional branching stop instruction get effective MIMD simulation running
The current tracedriven simulation approach determine superscalar processor performance widely used shortcomings Modern benchmarks generate extremely long traces resulting problems data storage well long simulation run times More fundamentally simulation generally provide significant insight factors determine performance characterization interactions This paper proposes theoretical model superscalar processor performance addresses shortcomings Performance viewed interaction program parallelism machine parallelism Both program machine parallelisms decomposed multiple component functions Methods measuring computing functions described The functions combined provide model interaction program machine parallelisms accurate estimate performance The computed performance based model compared simulated performance six benchmarks SPEC suite several configurations IBM RS instruction set architecture
Artificial neural networks applied prediction splice site location human premRNA A joint prediction scheme prediction transition regions introns exons regulates cutoff level splice site assignment able predict splice site locations confidence levels far better previously reported literature The problem predicting donor acceptor sites human genes hampered presence numerous amounts false positives paper distribution false splice sites examined linked possible scenario splicing mechanism vivo When presented method detects true donor acceptor sites makes less false donor site assignments less false acceptor site assignments For large data set used study means average one half false donor sites per true donor site six false acceptor sites per true acceptor site With joint assignment method fifth true donor sites around one fourth true acceptor sites could detected without accompaniment false positive predictions Highly confident splice sites could isolated widely used weight matrix method separate splice site networks A complementary relation confidence levels codingnoncoding separate splice site networks observed many weak splice sites sharp transitions codingnoncoding signal many stronger splice sites illdefined transitions coding noncoding
To coordinate agents environment agent needs models agents trying When communication impossible expensive information must acquired indirectly via plan recognition Typical approaches plan recognition start specification possible plans agents may following develop special techniques discriminating among possibilities Perhaps desirable would uniform procedure mapping plans general structures supporting inference based uncertain incomplete observations In paper describe set methods converting plans represented flexible procedural language observation models represented proba bilistic belief networks
DIMACS Technical Report DIMACS partnership Rutgers University Princeton University ATT Research Bellcore Bell Laboratories DIMACS NSF Science Technology Center funded contract STC also receives support New Jersey Commission Science Technology
The construction evolutionary trees fundamental problem biology yet methods reconstructing evolutionary trees reliable comes inferring accurate topologies large divergent evolutionary trees realistic length sequences We address problem present new polynomial time algorithm reconstructing evolutionary trees called Short Quartets Method consistent greater statistical power polynomial time methods NeighborJoining approximation algorithm Agarwala et al Double Pivot variant Agarwala et al algorithm Cohen Farach L nearest tree problem Our study indicates method produce correct topology shorter sequences guaranteed using methods
Artificial Life ALife research offers among things new style computer simulation understanding biological systems processes But current ALife work show enough methodological sophistication count good theoretical biology As first step towards developing stronger methodology ALife paper identifies methodological pitfalls arising computer science inuence ALife suggests methodological heuristics ALife theoretical biology notes strengths ALife methods versus previous research methods biology examines open questions theoretical biology may benefit ALife simulation argues debate Strong ALife relevant ALifes utility theoretical biology Introduction Simulating way Dark Continent
A complete characterization given closed shiftinvariant subspaces L IR provide specified approximation order When space principal ie generated single function characterization terms Fourier transform generator As special case obtain classical StrangFix conditions without requiring generating function decay infinity The approximation order general closed shiftinvariant space shown already realized specifiable principal subspace
In paper problem learning appropriate domainspecific bias addressed It shown achieved learning many related tasks domain theorem given bounding number tasks must learnt A corollary theorem tasks known possess common internal representation preprocessing number examples required per task good generalisation learning n tasks simultaneously scales like Oa b tive support theoretical results reported
Principal component analysis PCA ubiquitous technique data analysis processing one based upon probability model In paper demonstrate principal axes set observed data vectors may determined maximumlikelihood estimation parameters latent variable model closely related factor analysis We consider properties associated likelihood function giving EM algorithm estimating principal subspace iteratively discuss advantages conveyed definition probability density function PCA
The study belief change active area philosophy AI In recent years two special cases belief change belief revision belief update studied detail In companion paper Friedman Halpern introduce new framework model belief change This framework combines temporal epistemic modalities notion plausibility allowing us examine change beliefs time In paper show belief revision belief update captured framework This allows us compare assumptions made method better understand principles underlying In particular shows Katsuno Mendelzons notion belief update Katsuno Mendelzon depends several strong assumptions may limit applicability artificial intelligence Finally analysis allow us identify notion minimal change underlies broad range belief change operations including revision update fl Some work done authors IBM Almaden Research Center The first author also Stanford much work done IBM Stanfords support gratefully acknowledged The work also supported part Air Force Office Scientific Research AFSC Contract FC grant F NSF grants IRI IRI The first author also supported part IBM Graduate Fellowship Rockwell Science Center A preliminary version paper appears J Doyle E Sandewall P Torasso Eds Principles Knowledge Representation Reasoning Proc Fourth International Conference KR pp title A knowledgebased framework belief change Part II revision update
A new heuristic approach minimizing possibly nonlinear non differentiable continuous space functions presented By means extensive testbed includes De Jong functions demonstrated new method converges faster certainty Adaptive Simulated Annealing well Annealed NelderMead approach reputation powerful The new method requires control variables robust easy use lends well parallel computation
We present results three new algorithms setting stepsize parameters ff temporaldifference learning methods TD The overall task learning predict outcome unknown Markov chain based repeated observations state trajectories The new algorithms select stepsize parameters online way eliminate bias normally inherent temporaldifference methods We compare algorithms conventional Monte Carlo methods Monte Carlo methods natural way setting step size state use step size n n number times state visited We seek come close achieving comparable stepsize algorithms TD One new algorithm uses n schedule achieve effect processing state backwards TD remains completely incremental Another algorithm uses time equal estimated transition probability current transition We present empirical results showing improvement convergence rate Monte Carlo methods conventional TD A limitation results present apply tasks whose state trajectories contain cycles
Higher order spectra signal contain information non Gaussian non Linear properties system created Since non linearity musical signal usually originate excitation signal linear spectral characteristics attributed resonant chambers discard spectral information looking higher order statistical properties residual signal ie estimated input signal obtained inverse filtering sound In current paper show skewness kurtosis values residual could used characterization important sound properties belonging families strings woodwind brass instrumental timbres The skewness parameter shown closely related bicoherence function calculated original signal succinct interpretation statistical test signal conforming linear non Gaussian model The results compared Hinich bispectral tests Gaussianity non Linearity time series exhibit similar classification results Finally regarding higher order statistics signal feature vector statistical distance measure cumulant space suggested
In paper discuss two approaches applying memory model Case Retrieval Nets applications distributed processing information required For distinguish two types applications namely case distributed case libraries b case distributed cases While solution former straightforward latter requires extension Case Retrieval Nets provides kind partitioning entire net structure This extended model even allows concurrent implementation retrieval process use collaborative agents retrieval Keywords Casebased reasoning case retrieval memory structures distributed processing
Document draftingan important problemsolving task professionals wide variety fieldstypifies design task requiring complex adaptation case reuse This paper proposes framework document reuse based explicit representation illocutionary rhetorical structure underlying documents Explicit representation structure facilitates interpretation previous documents enabling explain construction documents enabling document drafters issue goalbased specifications rapidly retrieve documents similar intentional structure mainte nance multigeneration documents
Visualization proven powerful widelyapplicable tool analysis interpretation multivariate data Most visualization algorithms aim find projection data space twodimensional visualization space However complex data sets living highdimensional space unlikely single twodimensional projection reveal interesting structure We therefore introduce hierarchical visualization algorithm allows complete data set visualized top level clusters subclusters data points visualized deeper levels The algorithm based hierarchical mixture latent variable models whose parameters estimated using expectationmaximization algorithm We demonstrate principle approach toy data set apply algorithm visualization synthetic data set dimensions obtained simulation multiphase flows oil pipelines data dimensions derived satellite images A Matlab software implementation algorithm publicly available worldwide web
assumed unless otherwise stated Basically DE generates new parameter vectors adding weighted difference two population vectors third vector If resulting vector yields lower objective function value predetermined population member newly generated vector replaces vector compared next generation otherwise old vector retained This basic principle however extended comes practical variants DE For example existing vector perturbed adding one weighted difference vector In cases also worthwhile mix parameters old vector perturbed one comparing objective function values Several variants DE proven useful described
We present novel application ILP problem diterpene structure elucidation C NMR spectra Diterpenes organic compounds low molecular weight based skeleton carbon atoms They significant chemical commercial interest use lead compounds search new pharmaceutical effectors The structure elucidation diterpenes based C NMR spectra usually done manually human experts specialized background knowledge peak patterns chemical structures In process skeletal atoms assigned atom number corresponds proper place skeleton diterpene classified one possible skeleton types We address problem learning classification rules database peak patterns diterpenes known structure Recently propositional learning successfully applied learn classification rules spectra assigned atom numbers As assignment atom numbers difficult process possibly indistinguishable classification process apply ILP ie relational learning problem classifying spectra without assigned atom numbers
Models physical systems differ according computational cost accuracy precision among things Depending problem solving task hand different models appropriate Several investigators recently developed methods automatically selecting among multiple models physical systems Our research novel developing model selection techniques specifically suited computeraided design Our approach based idea artifact performance models computeraided design chosen light design decisions required support We developed technique called Gradient Magnitude Model Selection GMMS embodies principle GMMS operates context hillclimbing search process It selects simplest model meets needs hillclimbing algorithm operates We using domain sailing yacht design testbed research We implemented GMMS used hillclimbing search decide computationally expensive potentialflow program algebraic approximation analyze performance sailing yachts Experimental tests show GMMS makes design process faster would expensive model used design evaluations GMMS achieves performance improvement little sacrifice quality resulting design
We present new algorithm eliminating excess parameters improving network generalization supervised training The method Principal Components Pruning PCP based principal component analysis node activations successive layers network It simple cheap implement effective It requires network retraining involve calculating full Hessian cost function Only weight node activity correlation matrices layer nodes required We demonstrate efficacy method regression problem using polynomial basis functions economic time series prediction problem using twolayer feedforward network
Gradientbased numerical optimization complex engineering designs offers promise rapidly producing better designs However methods generally assume objective function constraint functions continuous smooth defined everywhere Unfortunately realistic simulators tend violate assumptions We present rulebased technique intelligently computing gradients presence pathologies simulators show gradient computation method used part gradientbased numerical optimization system We tested resulting system domain conceptual design supersonic transport aircraft found using rulebased gradients decrease cost design space search one orders magnitude
The first step casebased design systems select initial prototype database previous designs The retrieved prototype modified tailor given goals For particular design goal selection starting point design process dramatic effect quality eventual design overall design time We present technique automatically constructing effective prototypeselection rules Our technique applies standard inductivelearning algorithm C set training data describing particular prototype would best choice goal encountered previous design session We tested technique domain racingyachthull design comparing inductively learned selection rules several competing prototypeselection methods Our results show inductive prototypeselection method leads better final designs design process guided noisy evaluation function inductively learned rules often efficient competing methods Many automated design systems begin retrieving initial prototype library previous designs using given design goal index guide retrieval process The retrieved prototype modified set design modification operators tailor selected design given goals In many cases quality competing designs assessed using domainspecific evaluation functions cases designmodification process often This research benefited numerous discussions members Rutgers CAP project We thank Andrew Gelsey helping crossvalidation code John Keane helping RUVPP Andrew Gelsey Tim Weinrich comments previous draft paper This research supported ARPAfunded NASA grant NAG In context casebased design systems choice initial prototype affect quality final design computational cost obtaining design three reasons First prototype selection may impact quality prototypes lie disjoint search spaces In particular systems design modification operators convert prototype prototype choice initial prototype restrict set possible designs obtained search process A poor choice initial prototype may therefore lead suboptimal final design Second prototype selection may impact quality design process guided nonlinear evaluation function unknown global properties Since known method guaranteed find global optimum arbitrary nonlinear function design systems rely iterative local search methods whose results sensitive initial starting point Finally choice prototype may impact time needed carry design modification processtwo different starting points may yield final design take different amounts time get In design problems evaluating even single design take tremendous amounts time selecting appropriate initial prototype determining factor success failure design process This paper describes application inductive learning form rules selecting appropriate prototype designs The paper structured follows In Section describe inductive method learning prototypeselection rules In Section describe domain racingyachthull design tested prototypeselection methods In Sections describe experiments
This paper describes automatic design methods detecting fraudulent behavior Much design accomplished using series machine learning methods In particular combine data mining constructive induction standard machine learning techniques design methods detecting fraudulent usage cellular telephones based profiling customer behavior Specifically use rule learning program uncover indicators fraudulent behavior large database cellular calls These indicators used create profilers serve features system combines evidence multiple profilers generate highconfidence alarms Experiments indicate automatic approach performs nearly well best handtuned methods detecting fraud
In many cases programs lengths increase known bloat fluff increasing structural complexity artificial evolution We show bloat specific genetic programming suggest inherent search techniques discrete variable length representations using simple static evaluation functions We investigate bloating characteristics three nonpopulation one population based search techniques using novel mutation operator An artificial ant following Santa Fe trail problem solved simulated annealing hill climbing strict hill climbing population based search using two variants new subtree based mutation operator As predicted bloat observed using unbiased mutation absent simulated annealing hill climbers using length neutral mutation however bloat occurs mutations using population We conclude two causes bloat
We present method learning higherorder polynomial functions examples using linear regression feature construction Regression used set training instances produce weight vector linear function feature set If hypothesis imperfect new feature constructed forming product two features effectively predict squared error current hypothesis The algorithm repeated In extension method specific pair features combine selected measuring joint ability predict hypothesis error
Non Bayesian experimental design linear models reviewed Steinberg Hunter recent book Pukelsheim Ford Kitsos Titterington reviewed non Bayesian design nonlinear models Bayesian design linear nonlinear models reviewed We argue design problem best considered decision problem best solved maximizing expected utility experiment This paper considers marginal way appropriate theory non Bayesian design
We present methodology enables use classification algorithms regression tasks We implement method system RECLA transforms regression problem classification one uses existent classification system solve new problem The transformation consists mapping continuous variable ordinal variable grouping values appropriate set intervals We use misclassification costs means reflect implicit ordering among ordinal values new variable We describe set alternative discretization methods based experimental results justify need searchbased approach choose best method Our experimental results confirm validity searchbased approach class discretization reveal accuracy benefits adding misclassification costs
The Bayesian multivariate adaptive regression spline BMARS methodology Denison et al extended cope nonlinear time series financial datasets The nonlinear time series model closely related adaptive spline threshold autoregressive ASTAR method Lewis Stevens financial models thought Bayesian versions generalised simple autoregressive conditional heteroscadastic GARCH ARCH models
Some problems solved multiagent teams In using genetic programming produce teams one faces several design decisions First questions team diversity breeding strategy In one commonly used scheme teams consist clones single individuals individuals breed normal way cloned form teams fitness evaluation In contrast teams could also consist distinct individuals In case one either allow free interbreeding members different teams one restrict interbreeding various ways A second design decision concerns types coordinationfacilitating mechanisms provided individual team members range sensors various sorts complex communication systems This paper examines three breeding strategies clones free restricted three coordination mechanisms none deictic sensing namebased sensing evolving teams agents Serengeti world simple predatorprey environment Among conclusions fact simple form restricted interbreeding outperforms free interbreeding teams distinct individuals fact namebased sensing consistently outperforms deictic sensing
This paper presents Or nm rnm algorithm determining whether set n species perfect phylogeny number characters used describe species r maximum number states character The perfect phylogeny algorithm leads Oek k e k algorithm triangulating kcolored graph e edges
The RISC revolution spurred development processors increasing degrees instruction level parallelism ILP In order realize full potential processors multiple instructions must continuouslybe issued executed single cycle Consequently instruction scheduling plays crucial role optimization context While early attempts instruction scheduling limited compiletime approaches current trends aimed providing dynamic support hardware In paper present results detailed comparative study performance advantages derived spectrum instruction scheduling approaches limited basicblock schedulers compiler novel aggressive schedulers hardware A significant portion experimental study via simulations devoted understanding performance advantages runtime scheduling Our results indicate effective extracting ILP inherent program trace scheduled wide range machine program parameters Furthermore also show effectiveness enhanced simple basicblock scheduler compiler optimizes presence runtime scheduler target current basicblock schedulers designed take advantage feature We demonstrate fact presenting novel basicblock scheduling algorithm sensitive lookahead hardware target processor fl In Proceedings Third International Conference High Performance Computing Dec
Posner Raichles Images Mind excellent educational book well written Some aws scientific publication accuracy linear subtraction method used PET subject scrutiny research finer spatialtemporal resolutions b lack accuracy experimental paradigm used EEG complementary studies Images Posner Raichle excellent introduction interdisciplinary research cognitive imaging science Well written illustrated presents concepts manner well suited laymanundergraduate technical nonexpertgraduate student postdoctoral researcher Many people involved interdisciplinary neuroscience research agree P Rs statements page importance recognizing emergent properties brain function assemblies neurons It clear sparse references book intended standalone review broad field There aws scientific development must expected pioneering venture P R hav e proposed many cognitive mechanisms deserving study imaging tools yet developed yield better spatialtemporal resolutions
This paper establishes formulas used bound actual treatment effect experimental study treatment assignment random subject compliance imperfect These formulas provide tightest bounds average treatment effect inferred given distribution assignments treatments responses Our results reveal even high rates noncompliance experimental data yield significant sometimes accurate information effect treatment population
Most researchers machine learning built learning systems assumption external entity would work furnishing learning experiences Recently however investigators several subfields machine learning designed systems play active role choosing situations learn Such activity generally called exploration This paper describes exploratory learning projects reported literature attempts extract general account issues involved exploration
We study learnability ReadkSatisfyj RkSj DNF formulas These boolean formulas disjunctive normal form DNF maximum number occurrences variable bounded k number terms satisfied assignment j After motivating investigation class DNF formulas present algorithm unknown RkSj DNF formula learned high probability finds logically equivalent DNF formula using wellstudied protocol equivalence membership queries The algorithm runs polynomial time k j O log n log log n n number input variables
Most Bayesian theory optimal experimental design normal linear model developed restrictive assumption variance known In special cases insensitivity specific design criteria specific prior assumptions variance demonstrated general result show way Bayesian optimal designs affected prior information variance lacking This paper stresses important distinction expected utility functions optimality criteria examines number expected utility functions possess interesting properties deserve wider use derives relevant Bayesian optimality criteria normal assumptions This unifying setup useful proving main result paper clarifies issue designing normal linear model unknown variance
Recently software pipelining methods based ILP Integer Linear Programming framework successfully applied derive rateoptimal schedules architectures involving clean pipelines pipelines without structural hazards The problem architectures beyond clean pipelines remains open One challenge unified ILP framework simultaneously represent resource constraints unclean pipelines assignment mapping operations loop pipelines In paper provide framework exactly addition constructs rateoptimal software pipelined schedules
Planning learning multiple levels temporal abstraction key problem artificial intelligence In paper summarize approach problem based mathematical framework Markov decision processes reinforcement learning Current modelbased reinforcement learning based onestep models represent commonsense higherlevel actions going lunch grasping object flying Denver This paper generalizes prior work temporally abstract models Sutton extends prediction setting include actions control planning We introduce general form temporally abstract model multitime model establish suitability planning learning virtue relationship Bellman equations This paper summarizes theoretical framework multitime models illustrates potential advantages The need hierarchical abstract planning fundamental problem AI see eg Sacerdoti Laird et al Korf Kaelbling Dayan Hinton Modelbased reinforcement learning offers possible solution problem integrating planning realtime learning decisionmaking Peng Williams Moore Atkeson Sutton Barto However current modelbased reinforcement learning based onestep models represent commonsense higherlevel actions Modeling actions requires ability handle different interrelated levels temporal abstraction A new approach modeling multiple time scales introduced Sutton based prior work Singh Dayan Sutton Pinette This approach enables models environment different temporal scales intermixed producing temporally abstract models However work concerned predicting environment This paper summarizes extension approach including actions control environment Precup Sutton In particular generalize usual notion gridworld planning task
This paper proposes generalisation capabilities casebased reasoning system evaluated comparison rotelearning algorithm uses simple generalisation strategy Two algorithms defined expressions classification accuracy derived function size training sample A series experiments using artificial natural data sets described learning curve casebased learner compared apparently trivial rotelearning learning algorithms The results show number plausible situations learning curves simple casebased learner majority rotelearner barely distinguished although domain demonstrated favourable performance casebased learner observed This suggests maxim casebased reasoning similar problems similar solutions may useful basis generalisation strategy selected domains
Research robotics programming divided two camps The direct hand programmming approach uses explicit model behavioral model subsumption architecture The machine learning community uses neural network andor genetic algorithm We claim hand programming learning complementary The two approaches used together orders magnitude powerful approach taken separately We propose method combine It includes three concepts syntactic constraints restrict search space handmade problem decomposition hand given fitness We use method solve complex problem eightlegged locomotion It needs less evaluations compared genetic algorithm used alone
We apply recent results Markov chain theory Hastings Metropolis algorithms either independent symmetric candidate distributions provide necessary sufficient conditions algorithms converge geometric rate prescribed distribution In independence case IR k indicate geometric convergence essentially occurs candidate density bounded multiple symmetric case IR show geometric convergence essentially occurs geometric tails We also evaluate recently developed computable bounds rates convergence context examples show theoretical bounds inherently extremely conservative although chain stochastically monotone bounds may well effective
We consider game sequentially assigning probabilities future data based past observations logarithmic loss We making probabilistic assumptions generation data consider situation player tries minimize loss relative loss hindsight best distribution target class worst sequence data We give bounds minimax regret terms metric entropies target class respect suitable distances distributions
In introduced formal framework constructing ordinal similarity measures suggested might also applied cardinal measures In paper place approach general framework called similarity metrics In framework ordinal similarity metrics comparison returns boolean value combined cardinal metrics returning numeric value indeed metrics returning values types produce new metrics
In paper concerned problem inducing recursive Horn clauses small sets training examples The method iterative bootstrap induction presented In first step system generates simple clauses regarded properties required definition Properties represent generalizations positive examples simulating effect larger number examples Properties used subsequently induce required recursive definitions This paper describes method together series experiments The results support thesis iterative bootstrap induction indeed effective technique could general use ILP
Considerable effort directed recently develop asymptotically minimax methods problems recovering infinitedimensional objects curves densities spectral densities images noisy data A rich complex body work evolved nearly exactly minimax estimators obtained variety interesting problems Unfortunately results often translated practice variety reasons sometimes similarity known methods sometimes computational intractability sometimes lack spatial adaptivity We discuss method curve estimation based n noisy data one translates empirical wavelet coefficients towards origin amount method different methods common use today computationally practical spatially adaptive thus avoids number previous objections minimax estimators At time method nearly minimax wide variety loss functions eg pointwise error global error measured L p norms pointwise global error estimation derivatives wide range smoothness classes including standard Holder classes Sobolev classes Bounded Variation This much broader nearoptimality anything previously proposed minimax literature Finally theory underlying method interesting exploits correspondence statistical questions questions optimal recovery informationbased complexity Acknowledgements These results described Oberwolfach meeting Mathematische Stochastik December AMS Annual meeting January This work supported NSF DMS The authors would like thank PaulLouis Hennequin organized Ecole Ete de Probabilites Saint Flour collaboration began Universite de Paris VII Jussieu Universite de Parissud Orsay supporting visits DLD IMJ The authors would like thank Ildar Ibragimov Arkady Nemirovskii personal correspondence cited p
Figure Figure b show prior distribution fCR follows flat prior skewed prior respectively Figure c Figure show posterior distribution pf CR jD obtained system run Lipid data using flat prior skewed prior respectively From bounds Balke Pearl follows largesample assumption f CR jD Figure Prior b posterior cd distributions subpopulation f CR jD specified counterfactual query Would Joe improved taken drug given improve without corresponds flat prior b skewed prior This paper identifies demonstrates new application area networkbased inference techniques management causal analysis clinical experimentation These techniques originally developed medical diagnosis shown capable circumventing one major problems clinical experiments assessment treatment efficacy face imperfect compliance While standard diagnosis involves purely probabilistic inference fully specified networks causal analysis involves partially specified networks links given causal interpretation domain variables unknown The system presented paper provides clinical research community believe first time assumptionfree unbiased assessment average treatment effect We offer system practical tool used whenever full compliance enforced broadly whenever data available insufficient answering queries interest clinical investigator Lipid Research Clinic Program The lipid research clinics coronary primary prevention trial results parts ii Journal American Medical Association January
When Can We Give Causal Interpretation Abstract The assumptions underlying statistical estimation fundamentally different character causal assumptions underly structural equation models SEM The differences blurred years lack mathematical notation capable distinguishing causal equational relationships Recent advances graphical methods provide formal explication differences destined profound impact SEMs practice philosophy
Incremental Class Learning ICL provides feasible framework development scalable learning systems Instead learning complex problem ICL focuses learning subproblems incrementally one time using results prior learning subsequent learning combining solutions appropriate manner With respect multiclass classification problems ICL approach presented paper summarized follows Initially system focuses one category After learns category tries identify compact subset features nodes hidden layers crucial recognition category The system freezes crucial nodes features fixing incoming weights As result features obliterated subsequent learning These frozen features available subsequent learning serve parts weight structures build recognize categories As categories learned set features gradually stabilizes learning new category requires less effort Eventually learning new category may involve combining existing features appropriate manner The approach promotes sharing learned features among number categories also alleviates wellknown catastrophic interference problem We present results applying ICL approach Handwritten Digit Recognition problem based spatiotemporal representation patterns
Pathoriented scheduling methods trace scheduling hyperblock scheduling use speculation extract instructionlevel parallelism controlintensive programs These methods predict important execution paths current scheduling scope using execution profiling frequency estimation Aggressive speculation applied important execution paths possibly cost degraded performance along paths Therefore speed output code sensitive compilers ability accurately predict important execution paths Prior work area utilized speculative yield function Fisher coupled dependence height distribute instruction priority among execution paths scheduling scope While technique provides stability performance paying attention needs paths directly address problem mismatch compiletime prediction runtime behavior The work presented paper extends speculative yield dependence height heuristic explicitly minimize penalty suffered paths instructions speculated along path Since execution time path determined number cycles spent paths entrance exit scheduling scope heuristic attempts eliminate unnecessary speculation delays paths exit Such control speculation makes performance much less sensitive actual path taken run time The proposed method strong emphasis achieving minimal delay exits Thus name speculative hedge used This paper presents speculative hedge heuristic shows controls overspeculation superblockhyperblock scheduler The stability Copyright IEEE Published Proceedings th Annual International Symposium Microarchitecture December Paris France Personal use material permitted However permission reprintrepublish material resale redistribution purposes creating new collective works resale redistribution servers lists reuse copyrighted component work works must obtained IEEE Contact Manager Copyrights Permissions IEEE Service Center Hoes Lane PO Box Piscataway NJ USA Telephone Intl
A number exact algorithms developed perform probabilistic inference Bayesian belief networks recent years The techniques used algorithms closely related network structures easy understand implement In paper consider problem combinatorial optimization point view state efficient probabilistic inference belief network problem finding optimal factoring given set probability distributions From viewpoint previously developed algorithms seen alternate factoring strategies In paper define combinatorial optimization problem optimal factoring problem discuss application problem belief networks We show optimal factoring provides insight key elements efficient probabilistic inference demonstrate simple easily implemented algorithms excellent performance
Backpropagation learning Rumelhart Hinton Williams useful research tool number undesiderable features experimenter decide outside learned We describe number simulations neural networks internally generate teaching input The networks generate teaching input trasforming network input connection weights evolved using form genetic algorithm What results innate evolved capacity behave efficiently environment learn behave efficiently The analysis networks evolve learn shows interesting results
To appear Twelfth National Conference Artificial Intelligence AAAI Seattle WA July August Technical Report RA April Abstract Evaluation counterfactual queries eg If A true would C true important fault diagnosis planning determination liability We present formalism uses probabilistic causal networks evaluate ones belief counterfactual consequent C would true antecedent A true The antecedent query interpreted external action forces proposition A true consistent Lewis Miraculous Analysis This formalism offers concrete embodiment closest world approach properly reflects common understanding causal influences deals uncertainties inherent world amenable machine representation
Evaluation counterfactual queries eg If A true would C true important fault diagnosis planning determination liability policy analysis We present method evaluating counterfactuals underlying causal model represented structural models nonlinear generalization simultaneous equations models commonly used econometrics social sciences This new method provides coherent means evaluating policies involving control variables prior enacting policy influenced variables system
Theory refinement task updating domain theory light new cases done automatically expert assistance The problem theory refinement uncertainty reviewed context Bayesian statistics theory belief revision The problem reduced incremental learning task follows learning system initially primed partial theory supplied domain expert thereafter maintains internal representation alternative theories able interrogated domain expert able incrementally refined data Algorithms refinement Bayesian networks presented illustrate meant partial theory alternative theory representation etc The algorithms incremental variant batch learning algorithms literature work well batch incremental mode
The emergence generalist specialist behavior populations neural networks studied Energy extracting ability included property organism In artificial life simulations organisms living environment fitness score interpreted combination organisms behavior ability organism extract energy potential food sources distributed environment The energy extracting ability viewed evolvable trait organisms particular organisms mechanisms extracting energy environment therefore fixed decided researcher Simulations fixed evolvable energy extracting abilities show energy extracting mechanism sensory apparatus behavior organisms may coevolve coadapted The results suggest populations organisms evolve generalists specialists due individual energy extracting abilities
In paper consider problem theory patching given domain theory whose components indicated possibly flawed set labeled training examples domain concept The theory patching problem revise indicated components theory resulting theory correctly classifies training examples Theory patching thus type theory revision revisions made individual components theory Our concern paper determine classes logical domain theories theory patching problem tractable We consider propositional firstorder domain theories show theory patching problem equivalent determining information contained theory stable regardless revisions might performed theory We show determining stability tractable input theory satisfies two conditions revisions theory component monotonic effects classification examples theory components act independently classification examples theory We also show concepts introduced used determine soundness completeness particular theory patching algorithms
This paper studies balance evolutionary design human expertise order best design situated autonomous agents learn specific tasks A genetic algorithm designs control circuits learn simple behaviors given control strategies simple behaviors genetic algorithm designs combinational circuit switches simple behaviors perform navigation task Keywords Genetic Algorithms Computational Design Autonomous Agents Robotics
In paper propose threestage incremental approach development autonomous agents We discuss issues characteristics differentiate reinforcement programs RPs define trainer particular kind RP We present set results obtained running experiments trainer provides guidance AutonoMouse mousesized autonomous robot
In paper carefully formulate Schema Theorem Genetic Programming GP using schema definition accounts variable length nonhomologous nature GPs representation In manner similar early GA research use interpretations GP Schema Theorem obtain GP Building Block definition state classical Building Block Hypothesis BBH GP searches hierarchically combining building blocks We report approach convincing several reasons difficult find support promotion combination building blocks solely rigourous interpretation GP Schema Theorem even support BBH empirically questionable whether building blocks always exist partial solutions consistently average fitness resilience disruption assured also BBH constitutes narrow imprecise account GP search behavior
Although feedforward neural networks well suited function approximation applications networks experience problems learning desired function One problem interference occurs learning one area input space causes unlearning another area Networks less susceptible interference referred spatially local networks To understand properties theoretical framework consisting measure interference measure network localization developed incorporates network weights architecture also learning algorithm Using framework analyze sigmoidal multilayer perceptron MLP networks employ backprop learning algorithm address familiar misconception sigmoidal networks inherently nonlocal demonstrating given sufficiently large number adjustable parameters sigmoidal MLPs made arbitrarily local retaining ability represent continuous function compact domain
University WisconsinMadison Department Computer Sciences Technical Report CSTR Abstract The Iterated Prisoners Dilemma Choice Refusal IPDCR extension Iterated Prisoners Dilemma evolution allows players choose refuse game partners From individual behaviors behavioral population structures emerge In report examine one particular IPDCR environment document social network methods used identify population behaviors found within complex adaptive system In contrast standard homogeneous population nice cooperators also found metastable populations mixed strategies within environment In particular social networks interesting populations evolution examined
A paradigm statistical mechanics financial markets SMFM using nonlinear nonequilibrium algorithms first published L Ingber Mathematical Modelling fit multivariate financial markets using Adaptive Simulated Annealing ASA global optimization algorithm perform maximum likelihood fits Lagrangians defined path integrals multivariate conditional probabilities Canonical momenta thereby derived used technical indicators recursive ASA optimization process tune trading rules These trading rules used outofsample data demonstrate profit SMFM model illustrate markets likely efficient
The close connection reinforcement learning RL algorithms dynamic programming algorithms fueled research RL within machine learning community Yet despite increased theoretical understanding RL algorithms remain applicable simple tasks In paper I use abstract framework afforded connection dynamic programming discuss scaling issues faced RL researchers I focus learning agents learn solve multiple structured RL tasks environment I propose learning abstract environment models abstract actions represent intentions achieving particular state Such models variable temporal resolution models different parts state space abstract actions span different number time steps The operational definitions abstract actions learned incrementally using repeated experience solving RL tasks I prove certain conditions solutions new RL tasks found using simu lated experience abstract actions alone
We describe supervised learning algorithm EODG uses mutual information build oblivious decision tree The tree converted Oblivious readOnce Decision Graph OODG merging nodes level tree For domains appropriate decision trees OODGs performance approximately C number nodes OODG much smaller The merging phase converts oblivious decision tree OODG provides new way dealing replication problem new pruning mechanism works top starting root The pruning mechanism well suited finding symmetries aids recovering splits irrelevant features may happen tree construction
An approach explicitly formulated blend local global theory investigate oscillatory neocortical firings determine source information processing nature alpha rhythm The basis optimism founded statistical mechanical theory neocortical interactions success numerically detailing properties shorttermmemory STM capacity mesoscopic scales columnar interactions consistent theory deriving similar dispersion relations macroscopic scales electroencephalographic EEG magnetoencephalographic MEG activity Manuscript received March This project supported entirely personal contributions Physical Studies Institute University California San Diego Physical Studies Institute agency account Institute Pure Applied Physical Sciences
We present new results positive negative wellstudied problem learning disjunctive normal form DNF expressions We first prove algorithm due Kushilevitz Mansour used weakly learn DNF using membership queries polynomial time respect uniform distribution inputs This first positive result learning unrestricted DNF expressions polynomial time nontrivial formal model learning It provides sharp contrast results Kharitonov proved AC efficiently learnable model given certain plausible cryptographic assumptions We also present efficient learning algorithms various models readk SATk subclasses DNF For negative results turn attention recently introduced statistical query model learning This model restricted version popular Probably Approximately Correct PAC model practically every class known efficiently learnable PAC model fact learnable statistical query model Here give general characterization complexity statistical query learning terms number uncorrelated functions concept class This distributiondependent quantity yielding upper lower bounds number statistical queries required learning input distribution As corollary obtain DNF expressions decision trees even weakly learnable fl This research sponsored part Wright Laboratory Aeronautical Systems Center Air Force Materiel Command USAF Advanced Research Projects Agency ARPA grant number F Support also sponsored National Science Foundation Grant No CC Blum also supported part NSF National Young Investigator grant CCR Views conclusions contained document authors interpreted necessarily representing official policies endorsements either expressed implied Wright Laboratory United States Government NSF respect uniform input distribution polynomial time statistical query model This result informationtheoretic therefore rely unproven assumptions It demonstrates simple modification existing algorithms computational learning theory literature learning various restricted forms DNF decision trees passive random examples also several algorithms proposed experimental machine learning communities ID algorithm decision trees variants solve general problem The unifying tool results Fourier analysis finite class boolean functions hypercube
Planning Abstract Planning learning multiple levels temporal abstraction key problem artificial intelligence In paper summarize approach problem based mathematical framework Markov decision processes reinforcement learning Current modelbased reinforcement learning based onestep models represent commonsense higherlevel actions going lunch grasping object flying Denver This paper generalizes prior work temporally abstract models Sutton b extends prediction setting include actions control planning We introduce general form temporally abstract model multitime model establish suitability planning learning virtue relationship Bellman equations This paper summarizes theoretical framework multitime models illustrates potential ad The need hierarchical abstract planning fundamental problem AI see eg Sacerdoti Laird et al Korf Kaelbling Dayan Hinton Modelbased reinforcement learning offers possible solution problem integrating planning realtime learning decisionmaking Peng Williams Moore Atkeson Sutton Barto press However current modelbased reinforcement learning based onestep models represent commonsense higherlevel actions Modeling actions requires ability handle different interrelated levels temporal abstraction A new approach modeling multiple time scales introduced Sutton b based prior work Singh Dayan b Sutton Pinette This approach enables models environment different temporal scales intermixed producing temporally abstract models However work concerned predicting environment This paper summarizes vantages gridworld planning task
In complex changing environments explanation must dynamic goaldriven process This paper discusses evolving system implementing novel model explanation generation GoalDriven Interactive Explanation models explanation goaldriven multistrategy situated process interweaving reasoning action We describe preliminary implementation model gobie system generates explanations internal use support plan generation execution
It shown recently Clarke Ledyaev Sontag Subbotin asymptotically controllable system stabilized means certain type discontinuous feedback The feedback laws constructed work robust respect actuator errors well perturbations system dynamics A drawback however may highly sensitive errors measurement state vector This paper addresses shortcoming shows design dynamic hybrid stabilizing controller preserving robustness external perturbations actuator error also robust respect measurement error This new design relies upon controller incorporates internal model system driven previously constructed feedback
Report SYCON ABSTRACT This note presents explicit proof theorem due Artstein states existence smooth controlLyapunov function implies smooth stabilizability More result extended realanalytic rational cases well The proof uses universal formula given algebraic function Lie derivatives formula originates solution simple Riccati equation
Modulo scheduling efficient technique exploiting instruction level parallelism variety loops resulting high performance code increased register requirements We present set low computational complexity stagescheduling heuristics reduce register requirements given modulo schedule shifting operations multiples II cycles Measurements benchmark suite loops Perfect Club SPEC Livermore Fortran Kernels shows best heuristic achieves average decrease register requirements obtained optimal stage scheduler
Modulo scheduling efficient technique exploiting instruction level parallelism variety loops resulting high performance code increased register requirements We present approach schedules loop operations minimum register requirements given modulo reservation table Our method determines optimal register requirements machines finite resources general dependence graphs Measurements benchmark suite loops Perfect Club SPEC Livermore Fortran Kernels show register requirements decrease average applying optimal stage scheduler MRTschedules registerinsensitive modulo scheduler
The design implementation software Ring Array Processor RAP high performance parallel computer involved development three hardware platforms Sun SPARC workstations Heurikon MC boards running VxWorks realtime operating system Texas Instruments TMSC DSPs The RAP runs Sun workstations UNIX VME based system using VxWorks A flexible set tools provided RAP user programmer Primary emphasis placed improving efficiency layered artificial neural network algorithms This done providing library assembly language routines use nodecustom compilation An objectoriented RAP interface C provided allows programmers incorporate RAP computational server UNIX applications For wishing program C command interpreter built provides interactive shellscript style RAP manipulation
Selecting set features optimal given task problem plays important role wide variety contexts including pattern recognition adaptive control machine learning Our experience traditional feature selection algorithms domain machine learning lead appreciation computational efficiency concern brittleness This paper describes alternate approach feature selection uses genetic algorithms primary search component Results presented suggest genetic algorithms used increase robustness feature selection algorithms without significant decrease computational efficiency
In paper address following software pipelining problem given loop machine architecture fixed number processor resources eg function units one construct softwarepipelined schedule runs given architecture maximum possible iteration rate la rateoptimal minimizing number registers The main contributions paper First demonstrate problem described simple mathematical formulation precise optimization objectives periodic linear scheduling framework The mathematical formulation provides clear picture permits one visualize overall solution space rateoptimal schedules different sets con straints Secondly show precise mathematical formulation solution make significant performance difference We evaluated performance method three leading contemporary heuristic methods Huff Slack Scheduling Wang Eisenbeis Jourdan Sus FRLC Gasperoni Schwiegelshohns modified list scheduling Experimental results show method described paper performed significantly better methods
This paper concerns issue best form learning representing using knowledge decision making The proposed answer knowledge learned represented declarative form When needed decision making efficiently transferred procedural form tailored specific decision making situation Such approach combines advantages declarative representation facilitates learning incremental knowledge modification procedural representation facilitates use knowledge decision making This approach also allows one determine decision structures may avoid attributes unavailable difficult measure given situation Experimental investigations system FRD demonstrated decision structures obtained via declarative route often higher predictive accuracy also simpler learned directly facts
Several evolutionary algorithms make use hierarchical representations variable size rather linear strings fixed length Variable complexity structures provides additional representational power may widen application domain evolutionary algorithms The price however search space openended solutions may grow arbitrarily large size In paper study effects structural complexity solutions generalization performance analyzing fitness landscape sigmapi neural networks The analysis suggests smaller networks achieve average better generalization accuracy larger ones thus confirming usefulness Occams razor A simple method implementing Occams razor principle described shown effective improv ing generalization accuracy without limiting learning capacity
We present interactive algorithms learning regular grammars positive examples membership queries A structurally complete set strings language LG corresponding unknown regular grammar G implicitly specifies lattice version space represents space candidate grammars containing unknown grammar G This lattice searched efficiently using membership queries identify unknown grammar G using implicit representation version space form two sets S G correspond respectively set specific general grammars consistent set positive examples provided queries answered teacher given time We present provably correct incremental version algorithm structurally complete set positive samples necessarily available learner beginning learning The learner constructs lattice grammars based strings provided start performs candidate elimination posing safe membership queries When additional examples become available learner incrementally updates lattice continues candidate elimination Eventually set positive samples provided teacher encompasses structurally complete set unknown grammar algorithm terminates identifying unknown grammar G
We argue based upon numbers representations given length increase representation length inherent using fixed evaluation function discrete variable length representation Two examples analysed including use Prices Theorem Both examples confirm tendency solutions grow size caused fitness based selection
Environments vary time present fundamental problem adaptive systems Although worst case hope effective adaptation forms environmental variability provide adaptive opportunities We consider broad class nonstationary environments combine variable result function invariant utility function demonstrate via simulation adaptive strategy employing evolution learning tolerate much higher rate environmental variation evolutiononly strategy We suggest many cases stability previously assumed constant utility nonstationary environment may fact powerful viewpoint
We recently introduced neural network reactive obstacle avoidance based model classical operant conditioning In article describe success model implemented two real autonomous robots Our results show promise selforganizing neural networks domain intelligent robotics
The paper reports application genetic algorithms probabilistic search algorithms based model organic evolution NPcomplete combinatorial optimization problems In particular subset sum maximum cut minimum tardy task problems considered Except fitness function problemspecific changes genetic algorithm required order achieve results high quality even problem instances size used paper For constrained problems subset sum minimum tardy task constraints taken account incorporating graded penalty term fitness function Even large instances highly multimodal optimization problems iterated application genetic algorithm observed find global optimum within number runs As genetic algorithm samples tiny fraction search space results quite encouraging
CuPit specialpurpose programming language designed expressing dynamic neural network learning algorithms It provides flexibility generalpurpose languages C C expressive It allows writing much clearer elegant programs particular algorithms change network topology dynamically constructive algorithms pruning algorithms In contrast languages CuPit programs compiled efficient code parallel machines without changes source program thus providing easy start using parallel platforms This article analyzes circumstances CuPit approach useful one presents description language constructs reports performance results CuPit symmetric multiprocessors SMPs It concludes many cases CuPit good basis neural learning algorithm research smallscale parallel machines
Augmenting genetic algorithms local search heuristics promising approach solution combinatorial optimization problems In paper genetic local search approach quadratic assignment problem QAP presented New genetic operators realizing approach described performance tested various QAP instances containing facilitieslocations The results indicate proposed algorithm able arrive high quality solutions relatively short time limit largest publicly known prob lem instance new best solution could found
The problem programming artificial ant follow Santa Fe trail used example program search space Previously reported genetic programming simulated annealing hill climbing performance shown much better random search Ant problem Analysis program search space terms fixed length schema suggests highly deceptive simplest solutions large building blocks must assembled average fitness In cases show solutions assembled using fixed representation small building blocks average fitness This suggest Ant problem difficult Genetic Algorithms
Machine Learning research making great progress many directions This article summarizes four directions discusses current open problems The four directions improving classification accuracy learning ensembles classifiers b methods scaling supervised learning algorithms c reinforcement learning learning complex stochastic models
Fills algorithm perfect simulation attractive finite state space models unbiased user impatience presented terms stochastic recursive sequences extended two ways Repulsive discrete Markov random fields two coding sets like autoPoisson distribution lattice neighbourhood treated monotone systems particular partial ordering quasimaximal quasiminimal states used Fills algorithm applies directly Combining Fills rejection sampling sandwiching leads version algorithm works general discrete conditionally specified repulsive models Extensions types models briefly discussed
We consider special case reinforcement learning environment described linear system The states environment actions agent perform represented real vectors system dynamic given linear equation stochastic component The problem equivalent socalled linear quadratic regulator problem studied optimal adaptive control literature We propose learning algorithm problem analyze PAC learning framework Unlike algorithms adaptive control literature algorithm actively explores environment learn accurate model system faster We show control law produced algorithm high probability value close optimal policy relative magnitude initial state system The time taken algorithm polynomial dimension n statespace dimension r actionspace ratio nr constant
The results reported empirically show benefit decision tree size biases function concept distribution First shown concept distribution complexity number internal nodes smallest decision tree consistent example space affects benefit minimum size maximum size decision tree biases Second policy described defines learner given knowledge complexity distribution concepts Third explanations distribution concepts seen practice amenable minimum size decision tree bias given evaluated empirically
In paper describe sound classification method seems applicable broad domain stationary nonmusical sounds machine noises man made non periodic sounds The method based matching higher order spectra HOS acoustic signals generalizes earlier results classification sustained musical sounds higher order statistics An efficient decorrelated matched filter implemetation presented The results show good sound classification statistics comparison spectral matching methods also discussed
Many todays algorithms Inductive Logic Programming ILP put heavy burden responsibility user declarative bias defined rather lowlevel fashion To address issue developed method generating declarative language bias topdown ILP systems highlevel declarations The key feature approach distinction user level expert level language bias declarations The expert provides abstract metadeclarations user declares relationship metalevel given database obtain lowlevel declarative language bias The suggested languages allow compact abstract specifications declarative language bias topdown ILP systems using schemata We verified several properties translation algorithm generates schemata applied successfully chemical domains As consequence propose use twolevel approach generate declarative language bias
One difficult problems area explanation based learning utility problem learning many rules low utility lead swamping degradation performance This paper introduces two new techniques improving utility learned rules The first technique combine EBL inductive learning techniques learn better set control rules second technique use inductive techniques learn approximate control rules The two techniques synthesized algorithm called approximating abductive explanation based learning AxAEBL AxAEBL shown improve substantially standard EBL several domains
We address problem program discovery defined Genetic Programming By combining hierarchical crossover operator two traditional single point search algorithms Simulated Annealing Stochastic Iterated Hill Climbing solved problems processing fewer candidate solutions greater probability success Genetic Programming We also enhanced Genetic Programming hybridizing simple idea hill climbing individuals fixed interval generations
Most KDD applications consider databases static objects however many databases inherently temporal ie store evolution object passage time Thus regularities dynamics databases discovered current state might depend way previous states To end preprocessing data needed aimed extracting relationships intimately connected temporal nature data make available discovery algorithm The predicate logic language ILP methods together recent advances ef ficiency makes adequate task
In paper consider continous time method approximating given distribution using Langevin diffusion dL dW r log L dt We find conditions diffusion converges exponentially quickly one dimension essentially distributions exponential tails form x expfljxj fi lt fi lt exponential convergence occurs fi We consider conditions discrete approximations diffusion converge We first show even diffusion converges naive discretisations need We consider Metropolisadjusted version algorithm find conditions also converges exponential rate perhaps surprisingly even Metropolised version need converge exponentially fast even diffusion We briefly discuss truncated form algorithm practice avoid difficulties forms
An essential component intelligent agent ability notice encode store utilize information environment Traditional approaches program induction focused evolving functional reactive programs This paper presents MAPMAKER approach automatic generation agents discover information environment encode information later use create simple plans utilizing stored mental models In approach agents multipart computer programs communicate shared memory Both programs representation scheme evolved using genetic programming An illustrative problem gold collection used demonstrate approach one part program makes map world stores memory part uses map find gold The results indicate approach evolve programs store simple representations environments use representations produce simple plans Introduction
Reinforcement learning used predict rewards also predict states ie learn model worlds dynamics Models defined different levels temporal abstraction Multitime models models focus predicting happen rather certain event take place Based multitime models define abstract actions enable planning presumably efficient way various levels abstraction
Previous research shown technique called errorcorrecting output coding ECOC dramatically improve classification accuracy supervised learning algorithms learn classify data points one k classes This paper presents investigation ECOC technique works particularly employed decisiontree learning algorithms It shows ECOC method like form voting committeecan reduce variance learning algorithm Furthermoreunlike methods simply combine multiple runs learning algorithmECOC correct errors caused bias learning algorithm Experiments show bias correction ability relies nonlocal havior C
Technical Report CRGTR May revised Feb Abstract Factor analysis statistical method modeling covariance structure high dimensional data using small number latent variables extended allowing different local factor models different regions input space This results model concurrently performs clustering dimensionality reduction thought reduced dimension mixture Gaussians We present exact ExpectationMaximization algorithm fitting parameters mixture factor analyzers
The position size shape visual receptive field RF primary visual cortical neurons change dynamically response artificial scotoma conditioning cats Pettet Gilbert retinal lesions cats monkeys DarianSmith Gilbert The EXIN learning rules Marshall used model dynamic RF changes The EXIN model compared adaptation model Xing Gerstein LISSOM model Sirosh Miikkulainen Sirosh et al To emphasize role lateral inhibitory learning rules EXIN LISSOM simulations done lateral inhibitory learning During scotoma conditioning EXIN model without feedforward learning produces centrifugal expansion RFs initially inside scotoma region accompanied increased responsiveness without changes spontaneous activation The EXIN model without feedforward learning consistent neurophysiological data adaptation model LISSOM model The comparison EXIN LISSOM models suggests experiments determine role feedforward excitatory lateral inhibitory learning producing dynamic RF changes scotoma conditioning
In paper present bottomup algorithm called MRI induce logic programs examples This method induce programs base clause one recursive clause small number examples MRI based analysis saturations examples It first generates path structure expression stream values processed predicates The concept path structure originally introduced IdentamAlmquist used TIM IdestamAlmquist In paper introduce concepts extension difference path structure Recursive clauses expressed difference path structure extension The paper presents algorithm shows experimental results obtained method
The Bayesian analysis neural networks difficult simple prior weights implies complex prior distribution functions In paper investigate use Gaussian process priors functions permit predictive Bayesian analysis fixed values hyperparameters carried exactly using matrix operations Two methods using optimization averaging via Hybrid Monte Carlo hyperparameters tested number challenging problems produced excellent results
Explanations play key role operationalizationbased anomaly detection techniques In paper show role limited anomaly detection also used guiding automated knowledge base refinement We introduce refinement procedure takes small number refinement rules rather test cases ii explanations constructed attempt reveal cause causes inconsistencies detected verification process returns rule revisions aiming recover consistency KBtheory Inconsistencies caused one anomaly handled time improves efficiency refinement process
This paper sets conceptual framework openended artificial evolution complex behaviour autonomous agents If recurrent dynamical neural networks similar used phenotypes Genetic Algorithm employs variable length genotypes Inman Harveys SAGA capable evolving arbitrary levels behavioural complexity Furthermore simple restrictions encoding scheme governs genotypes develop phenotypes may guaranteed increase fitness requires increase behavioural complexity evolve In order process practicable design alternative however time periods involved must acceptable The final part paper looks general ways encoding scheme may modified speed process Experiments reported different categories scheme tested conclusions offered promising type encoding scheme vi able openended Evolutionary Robotics
We recently introduced neural network mobile robot controller NETMORC autonomously learns forward inverse odometry differential drive robot unsupervised learningbydoing cycle After initial learning phase controller move robot arbitrary stationary moving target compensating noise forms disturbance wheel slippage changes robots plant In addition forward odometric map allows robot reach targets absence sensory feedback The controller also able adapt response longterm changes robots plant change radius wheels In article review NETMORC architecture describe simplified algorithmic implementation present new quantitative results NETMORCs performance adaptability noisefree noisy conditions compare NETMORCs performance trajectoryfollowing task performance alternative controller describe preliminary results hardware implementation NETMORC mobile robot ROBUTER
We develop algorithm simulating perfect random samples invariant measure Harris recurrent Markov chain The method uses backward coupling embedded regeneration times works effectively finite chains stochastically monotone chains even continuous spaces paths may sandwiched upper lower processes Examples show naive approaches constructing bounding processes may considerably biased algorithm simplified certain cases make easier run We give explicit analytic bounds backward coupling times stochastically monotone case
This reports gives review new exact simulation algorithms using Markov chains The first part covers discrete case We consider two different algorithms Propp Wilsons coupling past CFTP technique Fills rejection sampler The algorithms tested Ising model without external field The second part covers continuous state spaces We present several algorithms developed Murdoch Green based coupling past We discuss applicability methods Bayesian analysis problem surgical failure rates
Specialist generalist behaviors populations artificial neural networks studied A genetic algorithm used simulate evolution processes thereby develop neural network control systems exhibit specialist generalist behaviors according fitness formula With evolvable fitness formulae evaluation measure let free evolve obtain coevolution expressed behavior individual evolvable fitness formula The use evolvable fitness formulae lets us work dynamic fitness landscape opposed work traditionally applies static fitness landscapes The role competition specialization studied letting individuals live social conditions shared environment directly compete We find competition act provide population diversification populations organisms individual evolvable fitness formulae
Using deliberately designed primitive sets investigate relationship contextbased expression mechanisms size height density genetic program trees evolutionary process We show contextual semantics influence composition location flows operative code program In detail analyze dynamics discuss impact findings microlevel descriptions genetic programming
Most traditional prediction techniques deliver mean probability distribution single point For multimodal processes instead predicting mean probability distribution important predict full distribution This article presents new connectionist method predict conditional probability distribution response input The main idea transform problem regression classification problem The conditional probability distribution network perform direct predictions iterated predictions task specific time series problems We compare method fuzzy logic discuss important differences also demonstrate architecture two time series The first benchmark laser series used Santa Fe competition deterministic chaotic system The second time series Markov process exhibits structure two time scales The network produces multimodal predictions series We compare predictions network nearestneighbor predictor find conditional probability network twice likely model
We present stable ILP crossdisciplinary concept straddling machine learning nonmonotonic reasoning Stable models give meaning logic programs containing negative assertions In stable ILP employ stable models represent current state specified possibly negative EDB IDB rules The state serves background knowledge topdown ILP learner We present framework implementation system INDED one realization stable ILP
In recent years considerable effort gone understanding default reasoning Most effort concentrated question entailment ie conclusions warranted knowledgebase defaults Surprisingly works formally examine general role defaults We argue examination role necessary order understand defaults suggest concrete role defaults Defaults simplify decisionmaking process allowing us make fast approximately optimal decisions ignoring certain possible states In order formalize approach examine decision making framework decision theory We use probability utility measure impact possible states decisionmaking process We accept default ignores states small impact according measure We motivate choice measures show resulting formalization defaults satisfies desired properties defaults namely cumulative reasoning Finally compare approach Pooles decisiontheoretic defaults show combined form attractive framework reasoning decisions We make numerous assumptions day car start road blocked heavy traffic pm etc Many assumptions defeasible willing retract given sufficient evidence Humans naturally state defaults draw conclusions default information Hence defaults seem play important part commonsense reasoning To use statements however need formal understanding defaults represent conclusions admit The problem default entailmentroughly conclusions draw knowledgebase defaultshas attracted great deal attention Many researchers attempt find contextfree patterns default reasoning eg Kraus et al As research shows much done approach We claim however utility approach limited gain better understanding defaults need understand situations willing state default Our main thesis investigation defaults elaborate role behavior reasoning agent This role allow us examine default appropriate terms implications agents overall performance In paper suggest particular role defaults show role allows us provide semantics defaults Of course claim role defaults play In many applications end result reasoning choice actions Usually choice optimal much uncertainty state world effects actions allow examination possibilities We suggest one role defaults lies simplifying decisionmaking process stating assumptions reduce space examined possibilities More precisely suggest default license ignore situations knowledge amounts One particular suggestion understood light semantics Pearl In semantics accept default given knowledge probability small This small probability states gives us license ignore Although probability plays important part decisions claim also examine utility actions For example people think highly unlikely die next year also believe accept default assumption context decision whether buy life insurance In context stakes high ignore outcome even though unlikely We suggest license ignore set given based impact decision To paraphrase view accept Bird Fly assuming bird flies get us much trouble To formalize intuitions examine decisionmaking framework decision theory Luce Raiffa Decision theory represents decision problem using several components set possible states probability measure sets utility function assigns action state numerical value fl To appear IJCAI
Density estimation commonly used test case nonparametric estimation methods We explore asymptotic properties estimators based thresholding empirical wavelet coefficients Minimax rates convergence studied large range Besov function classes B spq range global L p error measures p lt A single wavelet threshold estimator asymptotically minimax within logarithmic terms simultaneously range spaces error measures In particular p gt p form nonlinearity essential since minimax linear estimators suboptimal polynomial powers n A second approach using approximation Gaussian white noise model Mallows metric used Acknowledgements We thank Alexandr Sakhanenko helpful discussions references work Berry Esseen theorems used Section This work supported part NSF DMS The second author would like thank Universite de
used turn approximate A Empirical studies show good results achieved TSL However TSL several drawbacks Training set learners eg backpropagation typically slow may require many passes training set Also guarantee given arbitrary training set system find enough good critical features get reasonable approximation A Moreover number features searched exponential number inputs TSL becomes computationally expensive Finally scarcity interesting positive theoretical results suggests difficulty learning without sufficient priori knowledge The goal learning systems generalize Generalization commonly based set critical features system available Training set learners typically extract critical features random set examples While approach attractive suffers exponential growth number features searched We propose extend endowing system priori knowledge form precepts Advantages augmented system speedup improved generalization greater parsimony This paper presents preceptdriven learning algorithm Its main features include distributed implementation bounded learning execution times ability handle correct incorrect precepts Results simulations realworld data demonstrate promise This paper presents preceptdriven learning PDL PDL intended overcome TSLs weaknesses In PDL training set augmented small set precepts A pair p I O called example A precept example ientries inputs set special value dontcare An input whose value dontcare said asserted If effect value output The use special value dontcare therefore shorthand A pair containing dontcare inputs represents many examples product sizes input domains dontcare inputs Introduction
Many inductive learning problems expressed classical attributevalue language In order learn generalize learning systems often rely measure similarity current knowledge base new information The attributevalue language defines heterogeneous multidimensional input space attributes nominal others linear Defining similarity proximity two points input spaces non trivial We discuss two representative homogeneous metrics show examples limited domains We address issues raised design heterogeneous metric inductive learning systems In particular discuss need normalization impact dontcare values We propose heterogeneous metric evaluate empirically simplified version ILA
We study efficient algorithms solving following problem call switching distributions learning problem A sequence S n finite alphabet generated following way The sequence concatenation K runs consecutive subsequence Each run generated independent random draws distribution p p element set distributions fp p N g The learning algorithm given sequence goal find approximations distributions p p N give approximate segmentation sequence constituting runs We give efficient algorithm solving problem show conditions algorithm guaranteed work high probability
For connectionist networks adequate higher level cognitive activities natural language interpretation generalize way appropriate given regularities domain Fodor Pylyshyn identified important pattern regularities domains called systematicity Several attempts made show connectionist networks generalize accordance regularities satisfaction critics To address challenge paper starts establishing implications systematicity connectionist solutions variable binding problem Based work Hadley argue network must generalize information learns one variable binding variable bindings We show temporal synchrony variable binding Shastri Ajjanagadde inherently generalizes way Thereby show temporal synchrony variable binding connectionist architecture accounts systematicity This important step showing connectionism adequate architecture higher level cognition
I describe distance metric called edit distance quantifies syntactic difference two genetic programs In context one specific problem bit multiplexor I use metric analyze amount new material introduced different crossover operators difference among best individuals population difference among best individuals rest population The relationships data run performance imprecise sufficiently interesting encourage encourage investigation use edit distance
Both control data dependencies among primitives impact behavioural consistency subprograms genetic programming solutions Behavioural consistency turn impacts ability genetic programming identify promote appropriate subprograms We present results modelling dependency parameterized problem subprogram exhibits internal external dependency levels change subprogram successively incorporated larger subsolutions We find key difference nonexistent full external dependency longer time solution identification lower likelihood success shown increased difficulty identifying promoting correct subprograms
In paper compare performance serial parallel island model Genetic Algorithm solving Multiprocessor Scheduling Problem We show results using fixed scaled problems using using migration We found addition providing speedup use parallel processing parallel island model GA migration finds better quality solutions serial GA
An important reason continued popularity Artificial Neural Networks ANNs machine learning community gradientdescent backpropagation procedure gives ANNs locally optimal change procedure addition framework understanding ANN learning performance Genetic programming GP also successful evolutionary learning technique provides powerful parameterized primitive constructs Unlike ANNs though GP principled procedure changing parts learned system based current performance This paper introduces Neural Programming connectionist representation evolving programs maintains benefits GP The connectionist model Neural Programming allows regression creditblame procedure evolutionary learning system We describe general method informed feedback mechanism Neural Programming Internal Reinforcement We introduce Internal Reinforcement procedure demon strate use illustrative experiment
Topdown induction decision trees TDIDT popular machine learning technique Up till mainly used propositional learning seldomly relational learning inductive logic programming The main contribution paper introduction logical decision trees make possible use TDIDT inductive logic programming An implementation topdown induction logical decision trees
Probabilistic Neural Networks PNN typically learn quickly many neural network models success variety applications However basic form tend large number hidden nodes One common solution problem keep randomlyselected subset original training data building network This paper presents algorithm called Reduced Probabilistic Neural Network RPNN seeks choose betterthanrandom subset available instances use center points nodes network The algorithm tends retain nonnoisy border points removing nodes instances regions input space highly homogeneous In experiments datasets RPNN better average generalization accuracy two PNN models requiring average less onethird number nodes
In standard neuroevolution population networks evolved task network best solves task found This network fixed used solve future instances problem Networks evolved way handle realtime interaction well It hard evolve solution ahead time cope effectively possible environments might arise future possible ways someone may interact This paper proposes evolving feedforward neural networks online create agents improve performance realtime interaction This approach demonstrated game world neuralnetworkcontrolled individuals play humans Through evolution individuals learn react varying opponents appropriately taking account conflicting goals After initial evaluation offline population allowed evolve online performance improves considerably The population adapts novel situations brought changing strategies opponent game layout also improves performance situations already seen offline training This paper describe implementation online evolution shows practical method exceeds performance offline evolution alone
Networks Methods Results Abstract We devise feedforward Artificial Neural Network ANN procedure predicting utility loads present resulting predictions two test problems given The Great Energy Predictor Shootout The First Building Data Analysis Prediction Competition Key ingredients approach method ffi test determining relevant inputs Multilayer Perceptron These methods briefly reviewed together comments alternative schemes like fitting polynomials use recurrent networks
In paper first review main results theory schemata Genetic Programming GP summarise new GP schema theory based new definition schema Then study creation propagation disruption new form schemata real runs standard crossover onepoint crossover selection Finally discuss results light GP schema theorem
Radial basis function RBFs neural networks provide attractive method high dimensional nonparametric estimation use nonlinear control They faster train conventional feedforward networks sigmoidal activation networks backpropagation nets provide model structure better suited adaptive control This article gives brief survey use RBFs introduces new statistical interpretation radial basis functions new method estimating parameters using EM algorithm This new statistical interpretation allows us provide confidence limits predictions made using networks
We review main results obtained theory schemata Genetic Programming GP emphasising strengths weaknesses Then propose new simpler definition concept schema GP closer original concept schema genetic algorithms GAs Along new form crossover onepoint crossover point mutation concept schema used derive improved schema theorem GP describes propagation schemata one generation next We discuss result show schema theorem natural counterpart GP schema theorem
This paper investigates intrinsic limitation worstcase identification LTI systems using data corrupted bounded disturbances unknown plant known belong given model set This done analyzing optimal worstcase asymptotic error achievable performing experiments using bounded inputs estimating plant using identification algorithm First shown topological conditions model set identification algorithm asymptotically optimal input Characterization optimal asymptotic error function inputs also obtained These results hold error metric disturbance norm Second general results applied three specific identification problems identification stable systems norm identification stable rational systems H norm identification unstable rational systems gap metric For problems general characterization optimal asymptotic error used find nearoptimal inputs minimize error
We present connectionist architecture demonstrate learn syntactic parsing corpus parsed text The architecture represent syntactic constituents learn generalizations syntactic constituents thereby addressing sparse data problems previous connectionist architectures We apply Simple Synchrony Networks mapping sequences word tags parse trees After training parsed samples Brown Corpus networks achieve precision recall constituents approaches statistical methods task
Air Traffic Control involved realtime planning aircraft trajectories This heavily constrained optimization problem We concentrate freeroute planning aircraft required fly way points The choice proper representation realworld problem nontrivial We propose two level representation one level evolutionary operators work derived level calculations Furthermore show specific choice fitness function important finding good solutions large problem instances We use hybrid approach sense use knowledge air traffic control using number heuristics We built prototype planning tool resulted flexible tool generating freeroute planning low cost number aircraft
We describe maximumlikelihood parameter estimation problem ExpectationMaximization EM algorithm used solution We first describe abstract form EM algorithm often given literature We develop EM parameter estimation procedure two applications finding parameters mixture Gaussian densities finding parameters hidden Markov model HMM ie BaumWelch algorithm discrete Gaussian mixture observation models We derive update equations fairly explicit detail prove convergence properties We try emphasize intuition rather mathematical rigor
Genetic algorithms used neural networks two main ways optimize network architecture train weights fixed architecture While previous work focuses one two options paper investigates alternative evolutionary approach called Breeder Genetic Programming BGP architecture weights optimized simultaneously The genotype network represented tree whose depth width dynamically adapted particular application specifically defined genetic operators The weights trained nextascent hillclimbing search A new fitness function proposed quantifies principle Occams razor It makes optimal tradeoff error fitting ability parsimony network Simulation results two benchmark problems differing complexity suggest method finds minimal size networks clean data The experiments noisy data show using Occams razor improves generalization performance also accel erates convergence speed evolution fl Published Complex Systems
SPERT Synthetic PERceptron Testbed fully programmable single chip microprocessor designed efficient execution artificial neural network algorithms The first implementation CMOS technology MHz clock rate prototype system designed occupy double SBus slot within Sun Sparcstation SPERT sustain fi connections per second pattern classification around fi connection updates per second running popular error backpropagation training algorithm This represents speedup around two orders magnitude Sparcstation algorithms interest An earlier system produced group Ring Array Processor RAP used commercial DSP chips Compared RAP multiprocessor similar performance SPERT represents order magnitude reduction cost problems fixedpoint arithmetic satisfactory fl International Computer Science Institute Center Street Berkeley CA
Genetic Programming method program discovery consisting special kind genetic algorithm capable operating nonlinear chromosomes parse trees representing programs interpreter run programs optimised This paper describes PDGP Parallel Distributed Genetic Programming new form genetic programming suitable development finegrained parallel programs PDGP based graphlike representation parallel programs manipulated crossover mutation operators guarantee syntactic correctness offspring The paper describes operators reports preliminary results obtained paradigm
We define fitness structure genetic programming mapping subprograms program respective fitness values This paper shows various fitness structures problem independent subsolutions relate acquisition subsolutions The rate subsolution acquisition found directly correlated fitness structure whether structure uniform linear exponential An understanding fitness structure provides partial insight complicated relationship fitness function outcome genetic programmings search
It argued memorization events situations episodic memory requires rapid formation neural circuits responsive binding errors binding matches While formation circuits responsive binding matches modeled associative learning mechanisms rapid formation circuits responsive binding errors difficult explain given seemingly paradoxical behavior circuit must formed response occurrence binding ie particular pattern input subsequent formation must fire anymore response occurrence binding ie pattern led formation A plausible account formation circuits offered A computational model described demonstrates transient pattern activity representing event lead rapid formation circuits detecting bindings binding errors result longterm potentiation within structures whose architecture circuitry similar hippocampal formation neural structure known critical episodic memory The model exhibits high memory capacity robust limited amounts diffuse cell loss The model also offers alternate interpretation functional role region CA formation episodic memories predicts nature memory impairment would result damage various regions hippocampal formation
Specialization populations artificial neural networks studied Organisms fixed evolvable fitness formulae placed isolated shared environments emerged behaviors compared An evolvable fitness formula specifies evaluation measure let free evolve obtain coevolution expressed behavior individual evolvable fitness formula In isolated environment generalist behavior emerges organisms fixed fitness formula specialist behavior emerges organisms individual evolvable fitness formulae A population diversification analysis shows almost organisms population isolated environment converge towards behavioral strategy find competition act provide population diversification populations organisms shared environment
CLONES objectoriented library constructing training utilizing layered connectionist networks The CLONES library contains object classes needed write simulator small amount added source code examples included The size experimental ANN programs greatly reduced using objectoriented library time programs easier read write evolve The library includes database network behavior training procedures customized user It designed run efficiently data parallel computers RAP SPERT well uniprocessor workstations While efficiency portability parallel computers primary goals several secondary design goals allow heterogeneous algorithms training procedures interconnected trained together Within constraints attempt maximize variety artificial neural net work algorithms supported
Technical Report CSRP August Abstract Genetic Programming method program discovery consisting special kind genetic algorithm capable operating parse trees representing programs interpreter run programs optimised This paper describes Parallel Distributed Genetic Programming PDGP new form genetic programming suitable development parallel programs symbolic neural processing elements combined free natural way PDGP based graphlike representation parallel programs manipulated crossover mutation operators guarantee syntactic correctness offspring The paper describes operators reports results obtained exclusiveor problem
There much interest using optics implement computer interconnection networks However little discussion routing methodologies besides already used electronics In paper neural network routing methodology proposed generate control bits optical multistage interconnection network OMIN Though present optical implementation methodology illustrate control optical interconnection network These OMINs may used communication media shared memory distributed computing systems The routing methodology makes use Artificial Neural Network ANN functions parallel computer generating routes The neural network routing scheme may applied electrical well optical interconnection networks However since ANN implemented using optics routing approach especially appealing optical computing environment The parallel nature ANN computation may make routing scheme faster conventional routing approaches especially OMINs irregular Furthermore neural network routing scheme faulttolerant Results shown generating routes fi stage OMIN
The MultiSpert parallel system straightforward extension Spert workstation accelerator predominantly used speech recognition research ICSI In order deliver high performance Artificial Neural Network training without requiring changes user interfaces exisiting Quicknet ANN library modified run MultiSpert In report present algorithms used parallelization Quicknet code analyse communication computation requirements The resulting performance model yields better understanding system speedups potential bottlenecks Experimental results actual training runs validate model demonstrate achieved performance levels
In paper explore distributed database allocation problem intractable We also discuss genetic algorithms used successfully solve combinatorial problems Our experimental results show GA far superior greedy heuristic obtaining optimal near optimal fragment placements allocation problem various data sets
stefanwrobelgmdde sasodzeroskigmdde Proc FGML Annual Workshop GI Special Interest Group Machine Learning GI FG ed K Morik J Herrmann Research Report UnivDortmund Abstract The task discovering interesting regularities large sets data data mining knowledge discovery recently met increased interest Machine Learning general Inductive Logic Programming ILP particular However widely accepted definition task concept learning examples ILP definitions data mining task proposed recently In paper examine socalled nonmonotonic semantics definitions show nonmonotonicity incidental property data mining learning task task makes perfect sense without assumption We therefore introduce define generalized definition data mining task called ILP description learning problem discuss properties relation traditional concept learning prediction learning problem Since characterization entirely level models definition applies independently chosen hypothesis language
Optoelectronic reconfigurable interconnection networks limited significant control latency used large multiprocessor systems This latency time required analyze current traffic reconfigure network establish required paths The goal latency hiding minimize effect control overhead In paper introduce technique performs latency hiding learning patterns communication traffic using information anticipate need communication paths Hence network provides required communication paths request path made In study communication patterns memory accesses parallel program used input time delay neural network TDNN perform online training prediction These predicted communication patterns used interconnection network controller provides routes memory requests Based experiments neural network able learn highly repetitive communication patterns thus able predict allocation communication paths resulting reduction communication latency
Technical Report UMIACSTR CSTR Institute Advanced Computer Studies University Maryland College Park MD Abstract Shared memory multiprocessors require reconfigurable interconnection networks INs scalability These INs reconfigured IN control unit However INs often plagued undesirable reconfiguration time primarily due control latency amount time delay control unit takes decide desired new IN configuration To reduce control latency trainable prediction unit PU devised added IN controller The PUs job anticipate reduce control configuration time major component control latency Three different online prediction techniques tested learn predict repetitive memory access patterns three typical parallel processing applications D relaxation algorithm matrix multiply Fast Fourier Transform The predictions used routing control algorithm reduce control latency configuring IN provide needed memory access paths requested Three prediction techniques used tested Markov predictor linear predictor time delay neural network TDNN predictor As expected different predictors performed best different applications however TDNN produced best overall results
In paper explore distributed file task placement problem intractable We also discuss genetic algorithms used successfully solve combinatorial problems Our experimental results show GA far superior greedy heuristic obtaining optimal near optimal file task placements problem various data sets
In paper show posterior distribution feedforward neural networks asymptotically consistent This paper extends earlier results universal approximation properties neural networks Bayesian setting The proof consistency embeds problem density estimation problem uses bounds bracketing entropy show posterior consistent Hellinger neighborhoods It relates result back regression setting We show consistency setting number hidden nodes growing sample size case number hidden nodes treated parameter Thus provide theoretical justification using neural networks nonparametric regression Bayesian framework
This paper describes interactive planning system developed inside Intelligent Decision Support System aimed supporting operator planning initial attack forest fires The planning architecture rests integration casebased reasoning techniques constraint reasoning techniques exploited mainly performing temporal reasoning temporal metric information Temporal reasoning plays central role supporting interactive functions provided user performing two basic steps planning process plan adaptation resource scheduling A first prototype integrated situation assessment resource allocation manager subsystem currently tested
PrePruning PostPruning two standard methods dealing noise concept learning PrePruning methods efficient PostPruning methods typically accurate much slower generate overly specific concept description first We experimented variety pruning methods including two new methods try combine integrate pre postpruning order achieve accuracy efficiency This verified test series chess position classification task
Pruning effective method dealing noise Machine Learning Recently pruning algorithms particular Reduced Error Pruning also attracted interest field Inductive Logic Programming However shown methods inefficient time wasted generating clauses explain noisy examples subsequently pruning clauses We introduce new method searches good theories topdown fashion get better starting point pruning algorithm Experiments show approach significantly lower complexity task without losing predictive accuracy
Traditional databases commonly support efficient query update procedures operate time sublinear size database Our goal paper take first step toward dynamic reasoning probabilistic databases comparable efficiency We propose dynamic data structure supports efficient algorithms updating querying singly connected Bayesian networks In conventional algorithm new evidence absorbed time O queries processed time ON N size network We propose algorithm preprocessing phase allows us answer queries time Olog N expense Olog N time per evidence absorption The usefulness sublinear processing time manifests applications requiring near realtime response large probabilistic databases We briefly discuss potential application dynamic probabilistic reasoning computational biology
network Often however application need information every node network need exact probabilities We present localized partial evaluation LPE propagation algorithm computes interval bounds marginal probability specified query node examining subset nodes entire network Conceptually LPE ignores parts network far away queried node much impact value LPE anytime property able produce better solutions tighter intervals given time consider network
We describe integrated problem solving architecture named INBANCA Bayesian networks casebased reasoning CBR work cooperatively multiagent planning tasks This includes twoteam dynamic tasks paper concentrates simulated soccer example Bayesian networks used characterize action selection whereas casebased approach used determine implement actions This paper two contributions First survey integrations casebased Bayesian approaches perspective popular CBR task decomposition framework thus explaining types integrations attempted This allows us explain unique aspects proposed integration Second demonstrate Bayesian nets used provide environmental context thus feature selection information casebased reasoner
Machine learning systems often represent concepts rules sets attributevalue pairs Many learning algorithms generalize specialize concept representations removing adding pairs Thus concepts created general specific relationships This paper presents algorithms connect concepts network based general specific relationships Since concept access related concepts quickly resulting structure allows increased efficiency learning reasoning The time complexity one set learning models improves On log n Olog n n number nodes using general specific structure
This paper analyzes convergence properties canonical genetic algorithm CGA mutation crossover proportional reproduction applied static optimization problems It proved means homogeneous finite Markov chain analysis CGA never converge global optimum regardless initialization crossover operator objective function But variants CGAs always maintain best solution population either selection shown converge global optimum due irreducibility property underlying original nonconvergent CGA These results discussed respect schema theorem
Many techniques developed learning rules relationships automatically diverse data sets simplify often tedious errorprone process acquiring knowledge empirical data While techniques plausible theoretically wellfounded perform well less artificial test data sets depend ability make sense realworld data This paper describes project applying range machine learning strategies problems agriculture horticulture We briefly survey techniques emerging machine learning research describe software workbench experimenting variety techniques realworld data sets describe case study dairy herd management culling rules inferred mediumsized database herd information
Decisiontheoretic preferences specify relative desirability possible outcomes alternative plans In order express general patterns preference holding domain require language refer directly preferences classes outcomes well individuals We present basic concepts theory meaning generic comparatives facilitate incremental capture exploitation automated reasoning systems Our semantics lifts comparisons individuals comparisons classes things equal means contextual equivalences equivalence relations among individuals vary context application We discuss implications theory represent ing preference information
The Baldwin Effect first proposed late nineteenth century suggests course evolutionary change influenced individually learned behavior The existence effect still hotly debated topic In paper clear evidence presented learningbased plasticity phenotypic level produce directed changes genotypic level This research confirms earlier experimental work done others notably Hinton Nowlan Further amount plasticity learned behavior shown crucial size Baldwin Effect either little much effect disappears significantly reduced Finally learnable traits case made many generations become easier population whole learn traits ie phenotypic plasticity traits increase In gradual transition genetically driven population one driven learning importance Baldwin Effect decreases
This article presents new line research investigating online learning mechanisms autonomous intelligent agents We discuss casebased method dynamic selection modification behavior assemblages navigational system The casebased reasoning module designed addition traditional reactive control system provides flexible performance novel environments without extensive highlevel reasoning would otherwise slow system The method implemented ACBARR A CaseBAsed Reactive Robotic system evaluated empirical simulation system several different environments including box canyon environments known problematic reactive control systems general fl Technical Report GITCC College Computing Georgia Institute Technology Atlanta Geor gia
SG Specific General network supervised inductive learning examples uses ideas neural networks symbolic inductive learning gain benefits methods The network built many simple nodes learn important features input space monitor ability features predict output values The network avoids exponential nature number features creating specific features example expanding features making general Expansion feature terminates encounters another feature contradicting outputs Empirical evaluation model realworld data shown network provides good generalization performance Convergence accomplished within small number training passes The network provides benefits automatically allocating deleting nodes without requiring user adjustment parameters The network learns incrementally operates parallel fashion This paper describes network architecture supervised learning combines techniques used neural networks symbolic machine learning gain advantages approaches In supervised learning network given training set containing examples Each example gives input pattern along corresponding output network produce presented input The task network converge representation contains information given training set generalize information network respond well inputs trained One approach generalization look important features input space A feature subset network inputs along associated values A feature matched values network inputs part feature equal values inputs given feature Inputs part feature value A feature predicts output high probability important feature The number inputs contained feature order feature determines generality feature A feature inputs general feature feature many inputs specific feature It impractical monitor possible input features number features exponential number inputs This paper proposes SG Specific General network creates specific input features generalizes features One way SG generalizes combining similar specific features If two features similar close input space Combining two features dropping inputs common features creates new feature encompasses original features The new feature general matches points input space defined example This section presents overview model later sections provide detail system The network made many simple nodes Each node contains input feature monitors During training node gathers statistics giving discrete conditional probability possible output value given input feature
We provide analytical expressions governing changes bias variance lookup table estimators provided various Monte Carlo temporal difference value estimation algorithms oine updates trials absorbing Markov reward processes We used expressions develop software serves analysis tool given complete description Markov reward process rapidly yields exact meansquareerror curve curve one would get averaging together sample meansquareerror curves infinite number learning trials given problem We use analysis tool illustrate classes meansquareerror curve behavior variety example reward processes show although various temporal difference algorithms quite sensitive choice stepsize eligibilitytrace parameters values parameters make similarly competent generally good
We examine inductive inference complex grammar specifically consider task training model classify natural language sentences grammatical ungrammatical thereby exhibiting kind discriminatory power provided Principles Parameters linguistic framework GovernmentandBinding theory We investigate following models feedforward neural networks FransconiGoriSoda BackTsoi locally recurrent networks Elman Narendra Parthasarathy Williams Zipser recurrent networks Euclidean editdistance nearestneighbors simulated annealing decision trees The feedforward neural networks nonneural network machine learning models included primarily comparison We address question How neural network distributed nature gradient descent based iterative calculations possess linguistic capability traditionally handled symbolic computation recursive processes Initial simulations models partially successful using large temporal window input Models trained fashion learn grammar significant degree Attempts training recurrent networks small temporal input windows failed implemented several techniques aimed improving convergence gradient descent training algorithms We discuss theory present empirical study variety models learning algorithms highlights behaviour present attempting learn simpler grammar
A parallel version proposed fundamental theorem serial unconstrained optimization The parallel theorem allows k parallel processors use simultaneously different algorithm descent Newton quasiNewton conjugate gradient algorithm Each processor perform one many steps serial algorithm portion gradient objective function assigned independently processors Eventually synchronization step performed differentiable convex functions consists taking strong convex combination k points found k processors For nonconvex well convex differentiable functions best point found k processors taken better point The fundamental result establish accumulation point parallel algorithm stationary nonconvex case global solution convex case Computational testing Thinking Machines CM multiprocessor indicate speedup order number processors employed
Sensors represent crucial link evolutionary forces shaping species relationship environment individuals cognitive abilities behave learn We report experiments using new class latent energy environments LEE models define environments carefully controlled complexity allow us state bounds random optimal behaviors independent strategies achieving behaviors Using LEEs analytic basis defining environments use neural networks NNets model individuals steady state genetic algorithm model evolutionary process shaping NNets particular sensors Our experiments consider two types contact ambient sensors variants NNets allowed learn learn via error correction internal prediction via reinforcement learning We find predictive learning even using larger repertoire sophisticated ambient sensors provides advantage NNets unable learn However reinforcement learning using small number crude contact sensors provide significant advantage Our analysis results points tradeoff genetic robustness sensors informativeness learning system
This brief annotated bibliography I wanted make available attendees Machine Learning tutorial AI Statistics Workshop These slides available WWW pages slides Please contact questions Please also note date listed recently updated While I plan make occasional updates file bound outdated quickly Also I apologize lack figures time project limited slides compensate Finally bibliography definition This book date Both Pat Langley Tom Mitchell process writing textbooks subject still waiting Until I suggest looking Readings recent ML conference proceedings International European There also introductory papers subject though I havent gotten around putting yet However Pat Langley Dennis Kibler written good paper ML empirical science Pat written several editorials use ML author Langley incomplete Ive left many references may use
A Bayesian approach multivariate adaptive regression spline MARS fitting Friedman proposed This takes form probability distribution space possible MARS models explored using reversible jump Markov chain Monte Carlo methods Green The generated sample MARS models produced shown good predictive power averaged allows easy interpretation relative importance predictors overall fit
resent allowed sequences resolution steps initial theory There however many characterizations allowed sequences resolution steps expressed set resolvents One approach problem presented system merlin based earlier technique learning finitestate automata represent allowed sequences resolution steps merlin extends previous technique three ways negative examples considered addition positive examples ii new strategy performing generalization used iii technique converting learned automaton logic program included Results experiments presented merlin outperforms system using old strategy performing generalization traditional covering technique The latter result explained limited expressiveness hypotheses produced covering also fact covering needs produce correct base clauses recursive definition
We discuss ideas producing perfect simulations based coupling past finite state space models naturally extend multivariate distributions infinite uncountable state spaces autogamma autoPoisson autonegativebinomial models using Gibbs sampling combination sandwiching methods originally introduced perfect simulation point processes
The main result paper establishes equivalence null asymptotic controllability nonlinear finitedimensional control systems existence continuous controlLyapunov functions clfs defined means generalized derivatives In manner one obtains complete characterization asymptotic controllability applying principle far wider class systems Artsteins Theorem relates closedloop feedback stabilization existence smooth clfs The proof relies viability theory optimal control techniques Introduction In paper study systems general form
We apply recent results minimax risk density estimation related problem pattern classification The notion loss seek minimize information theoretic measure well predict classification future examples given classification previously seen examples We give asymptotic characterization minimax risk terms metric entropy properties class distributions might generating examples We use results characterize minimax risk special case noisy twovalued classification problems terms Assouad density
Genetic algorithms GAs extensively used different domains means global optimization simple yet reliable manner They much better chance getting global optima gradient based methods usually converge local sub optima However GAs tendency getting moderately close optima small number iterations To get close optima GA needs large number iterations Whereas gradient based optimizers usually get close local optima relatively small number iterations In paper describe new crossover operator designed endow GA gradientlike abilities without actually computing gradients without sacrificing global optimality The operator works using guidance members GA population select direction exploration Empirical results two engineering design domains across binary floating point representations demonstrate operator significantly improve steady state error GA optimizer
Neural networks trained balancing poles attached cart fixed track For one variant single pole system pole angle cart position variables supplied inputs network must learn compute velocities All problems solved using fixed architecture using new version cellular encoding evolves application specific architecture realvalued weights The learning times generalization capabilities compared neural networks developed using methods After post processing simplification topologies produced cellular encoding simple could analyzed Architectures hidden units produced single pole two pole problem velocity information supplied input Moreover linear solutions display good generalization For control problems cellular encoding automatically generate architectures whose complexity structure reflect features problem solve
A recent result Jun Lius shown compute explicitly eigenvalues eigenvectors Markov chain derived special case Hastings sampling algorithm known indepdendence Metropolis sampler In note show extend result obtain exact nstep transition probabilities n This done first chain finite state space extended general discrete continuous state space The paper concludes implications diagnostic tests convergence Markov chain samplers
It well known searchspace reformulation improve speed reliability numerical optimization engineering design We argue best choice reformulation depends design goal present technique automatically constructing rules map design goal reformulation chosen space possible reformulations We tested technique domain racingyachthull design reformulation corresponds incorporating constraints search space We applied standard inductivelearning algorithm C set training data describing constraints active optimal design goal encountered previous design session We used rules choose appropriate reformulation set test cases Our experimental results show using reformulations improves speed reliability design optimization outperforming competing methods approaching best performance possible
apply reduction two core children total sum matching weights becomes On comparison spine node critical node apply reduction core child spine node total sum matching weights becomes On With regards O comparisons two critical nodes sum exceed O n total weight Thus since total On edges involved matchings time On reduce total sum matching weights O n Theorem Let M IR IR monotone function bounding time complexity UWBM Moreover let M satisfy M x x fx constant f x Ox f monotone constants b b x b fxy b f xf Then p time On M n Proof We spend On polylog n timeUWBMk n matchings So Theorem CompCoreTrees computed time On polylog n timeUWBMk n Applying Theo get Corollary MAST computable time On log n WHE Day Computational complexity inferring phylogenies dissimilarity matrices Bulletin Mathematical Biology M Farach S Kannan T Warnow A robust model finding optimal evolutionary trees Algorithmica In press See also STOC
We report results vowel stop consonant recognition tokens extracted TIMIT database Our current system differs others similar tasks use specific time normalization techniques We use detailed biologically motivated input representation speech tokens Lyons cochlear model implemented Slaney This detailed high dimensional representation known cochleagram classified either backpropagation hybrid supervisedunsupervised neural network classifier The hybrid network composed biologically motivated unsupervised network supervised backpropagation network This approach produces results comparable obtained others without addition time normalization
This paper presents new application ExclusiveSumOfProducts ESOP minimizer EXORCISMMV Machine Learning particularly Pattern Theory An analysis various logic synthesis programs conducted Wright Laboratory machine learning applications Creating robust efficient Boolean minimizer machine learning would minimize decomposed function cardinality DFC measure functions would help solve practical problems application areas interest Pattern Theory Group especially problems require strongly unspecified multiplevaluedinput functions large number variables For many functions complexity minimization EXORCISMMV better Espresso For small functions worse Curtislike Decomposer However EXORCISM much faster run problems variables significant DFC improvements also found We analyze cases EXORCISM worse Espresso propose new improvements strongly unspecified functions
Two selforganising controller networks presented study The Clustered Controller Network CCN uses spatial clustering approach select controllers instant In gated controller network ModelsController Network MCN performance model attached controller used achieve controller selection An algorithm automaticly conctrust architecture networks described It makes two schemes selforganising Different examples control nonlinear systems considered order illustrate behaviour ICCN IMCN It makes clear schemes performing much better single adaptive controller The two main advantages ICCN IMCN concern possibilities use controller building block network architecture apply ICCN modelling purpose However ICCN appears serious problems cope nonlinear systems single variable implying nonlinear behaviour The IMCN suffer trouble This high sensitivity clustering space order main drawback limiting use ICCN therefore makes IMCN much suitable approach control wide range nonlinear systems
This paper offers perspective features pattern finding general This perspective based robust complexity measure called Decomposed Function Cardinality A function decomposition algorithm minimizing complexity measure finding associated features outlined Results experiments algorithm also summarized
We investigate problem estimating proportion vector maximizes likelihood given sample mixture given densities We adapt framework developed supervised learning give simple derivations many standard iterative algorithms like gradient projection EM In framework distance new old proportion vectors used penalty term The square distance leads gradient projection update relative entropy new update call exponentiated gradient update EG Curiously second order Taylor expansion relative entropy used arrive update EM gives usual EM update Experimentally EM update EG update gt outperform EM algorithm variants We also prove polynomial bound rate convergence EG algorithm
This paper compares direct reinforcement learning explicit model modelbased reinforcement learning simple task pendulum swing We find task modelbased approaches support reinforcement learning smaller amounts training data efficient handling changing goals
This article compares traditional fixed problem representation style genetic algorithm GA new floating representation building blocks problem fixed specific locations individuals population In addition effects noncoding segments representations studied Noncoding segments computational model noncoding DNA floating building blocks mimic location independence genes The fact structures prevalent natural genetic systems suggests may provide advantages evolutionary process Our results show significant difference GAs solve problem fixed floating representations GAs able maintain diverse population floating representation The combination noncoding segments floating building blocks appears encourage GA take advantage parallel search recombination abilities
We extend Hoeffding bounds develop superior probabilistic performance guarantees accurate classifiers The original Hoeffding bounds classifier accuracy depend accuracy parameter Since accuracy known priori parameter value gives weakest bounds used We present method loosely bounds accuracy using old method uses loose bound improved parameter value tighter bounds We show use bounds practice generalize bounds individual classifiers form uniform bounds multiple classifiers
Evolving Cooperative Groups Preliminary Results Abstract Multiagent systems require coordination sources distinct expertise perform complex tasks effectively In paper use coevolutionary approach using genetic algorithms evolve multiple individuals effectively cooperate solve common problem We concurrently run GA individual group In paper experiment room painting domain requires cooperation two agents We used two mechanisms evaluating individual one population pair randomly members population b pair members population shared memory containing best pairs found far Both approaches successful generating optimal behavior patterns However preliminary results exhibit slight edge shared memory approach
Coevolution refers simultaneous evolution two genetically distinct populations coupled fitness landscapes In paper consider competitive coevolution fitness individual host population based direct competition individuals parasite population Competitive coevolution applied three gamelearning problems TicTacToe TTT Nim small version Go Two new techniques competitive coevolution explored Competitive fitness sharing changes way fitness measured shared sampling alters way parasites chosen testing hosts Experiments using TTT Nim show substantial improvement performance methods used Preliminary results using coevolution discovery cellular automata rules playing Go presented
We review use global local methods estimating function mapping R R n samples function containing noise The relationship methods examined empirical comparison performed using multilayer perceptron MLP global neural network model single nearestneighbour model linear local approximation LA model following commonly used datasets MackeyGlass chaotic time series Sunspot time series British English Vowel data TIMIT speech phonemes building energy prediction data sonar dataset We find simple local approximation models often outperform MLP No criterion classificationprediction size training set dimensionality training set etc used distinguish whether MLP local approximation method superior However find consider histograms kNN density estimates training datasets choose best performing method priori selecting local approximation spread density histogram large choosing MLP otherwise This result correlates hypothesis global MLP model less appropriate characteristics function approximated varies throughout input space We discuss results smoothness assumption often made function approximation biasvariance dilemma
We present implementation Kohonen SelfOrganizing Feature Maps SpertII vector microprocessor system The implementation supports arbitrary neural map topologies arbitrary neighborhood functions For small networks used realworld tasks single SpertII board measured run Kohonen net classification million connections per second MCPS On speech coding benchmark task SpertII performs online Kohonen net training million connection updates per second MCUPS This represents almost factor improvement compared previously reported implementations The asymptotic peak speed system MCPS MCUPS
Despite fact complex visual scenes contain multiple overlapping objects people perform object recognition ease accuracy One operation facilitates recognition early segmentation process features objects grouped labeled according object belong Current computational systems perform operation based predefined grouping heuristics We describe system called MAGIC learns group features based set presegmented examples In many cases MAGIC discovers grouping heuristics similar previously proposed also capability finding nonintuitive structural regularities images Grouping performed relaxation network attempts dynamically bind related features Features transmit complexvalued signal amplitude phase one another binding thus represented phase locking related features MAGICs training procedure generalization recurrent back propagation complexvalued units
The simple Bayesian classifier SBC commonly thought assume attributes independent given class apparently contradicted surprisingly good performance exhibits many domains contain clear attribute dependences No explanation proposed far In paper show SBC fact assume attribute independence optimal even assumption violated wide margin The key finding lies distinction classification probability estimation correct classification achieved even probability estimates used contain large errors We show previouslyassumed region optimality SBC secondorder infinitesimal fraction actual one This followed derivation several necessary several sufficient conditions optimality SBC For example SBC optimal learning arbitrary conjunctions disjunctions even though violate independence assumption The paper also reports empirical evidence SBCs competitive performance domains containing substantial degrees attribute dependence
We propose method use Inductive Logic Programming give heuristic functions searching goals solve problems The method takes solutions problem history search set background knowledge problem In large class problems problem described set states set operators solved finding series operators A solution series operators brings initial state final state transformed positive negative examples relation betterchoice describes operator better others state We also give way use betterchoice relation heuristic function The method use logic program background knowledge induce heuristics induced heuristics high readability The paper inspects method applying puzzle
We describe development monitoring system uses sensor observation data discrete events construct dynamically probabilistic model world This model Bayesian network incorporating temporal aspects call Dynamic Belief Network used reason uncertainty causes consequences events monitored The basic dynamic construction network datadriven However model construction process combines sensor data events externally provided information agents behaviour knowledge already contained within model control size complexity network This means network structure within time interval amount history detail maintained vary time We illustrate system example domain monitoring robot vehicles people restricted dynamic environment using lightbeam sensor data In addition presenting generic network structure monitoring domains describe use complex network structures address two specific monitoring problems sensor validation Data Association Problem
We evaluate power decision tables hypothesis space supervised learning algorithms Decision tables one simplest hypothesis spaces possible usually easy understand Experimental results show artificial realworld domains containing discrete features IDTM algorithm inducing decision tables sometimes outperform stateoftheart algorithms C Surprisingly performance quite good datasets continuous features indicating many datasets used machine learning either require features features values We also describe incremental method performing crossvalidation applicable incremental learning algorithms including IDTM Using incremental crossvalidation possible crossvalidate given dataset IDTM time linear number instances number features number label values The time incremental crossvalidation independent number folds chosen hence leaveoneout crossvalidation tenfold crossvalidation take time
In wrapper approach feature subset selection search optimal set features made using induction algorithm black box The estimated future performance algorithm heuristic guiding search Statistical methods feature subset selection including forward selection backward elimination stepwise variants viewed simple hillclimbing techniques space feature subsets We utilize bestfirst search find good feature subset discuss overfitting problems may associated searching many feature subsets We introduce compound operators dynamically change topology search space better utilize information available evaluation feature subsets We show compound operators unify previous approaches deal relevant irrelevant features The improved feature subset selection yields significant improvements realworld datasets using ID NaiveBayes induction algorithms
We constructed inexpensive videobased motorized tracking system learns track head It uses real time graphical user inputs auxiliary infrared detector supervisory signals train convolutional neural network The inputs neural network consist normalized luminance chrominance images motion information frame differences Subsampled images also used provide scale invariance During online training phase neural network rapidly adjusts input weights depending upon reliability different channels surrounding environment This quick adaptation allows system robustly track head even objects moving within cluttered background
In paper consider complexity number combinatorial problems namely Intervalizing Colored Graphs DNA physical mapping Triangulating Colored Graphs perfect phylogeny Directed Modified Colored Cutwidth Feasible Register Assignment Module Allocation graphs bounded treewidth Each problems characteristic uniform upper bound tree path width graphs yesinstances For problems exceptions feasible register assignment module allocation vertex edge coloring given part input Our main results parameterized variant considered problems hard complexity classes W Z We also show Intervalizing Colored Graphs Triangulating Colored Graphs
It wellknown certain learning methods eg perceptron learning algorithm acquire complete parity mappings But often overlooked stateoftheart learning methods C backpropagation generalise incomplete parity mappings The failure methods generalise parity mappings may sometimes dismissed grounds impossible generalise mappings parity problems mathematical constructs little realworld learning However paper argues dismissal unwarranted It shows parity mappings hard learn statistically neutral statistical neutrality property expect encounter frequently realworld contexts It also shows generalization failure parity mappings occurs even large minimally incomplete mappings used training purposes ie claims impossibility generalization particularly suspect
There several stochastic methods used solving NPhard optimization problems approximatively Examples algorithms include order increasing computational complexity stochastic greedy search methods simulated annealing genetic algorithms We investigate methods likely give best performance practice respect computational effort requires We study problem empirically selecting set stochastic algorithms varying computational complexity experimentally evaluating method goodness results achieved improves increasing computational time For evaluation use graph optimization problem closely related several realworld practical problems To get wider perspective goodness achieved results stochastic methods also compared specialcase greedy heuristics This investigation suggests although genetic algorithms provide good results simpler stochastic algorithms achieve similar performance quickly
There two generations Gibbs sampling methods semiparametric models involving Dirichlet process The first generation suffered severe drawback namely locations clusters groups parameters could essentially become fixed moving rarely Two strategies proposed create second generation Gibbs samplers integration appending second stage Gibbs sampler wherein cluster locations moved We show strategies easily implemented sequential importance sampler first strategy dramatically improves results As case Gibbs sampling strategies applicable much wider class models They shown provide uniform importance sampling weights lead additional RaoBlackwellization estimators Steve MacEachern Associate Professor Department Statistics Ohio State University Merlise Clyde Assistant Professor Institute Statistics Decision Sciences Duke University Jun Liu Assistant Professor Department Statistics Stanford University The work second author supported part National Science Foundation grants DMS DMS last author National Science Foundation grants DMS DMS Terman Fellowship
We show uniform prior hypothesis functions training error early stopping fixed training error training error minimum results increase expected generalization error We also show regularization methods equivalent early stopping certain nonuniform prior early stopping solutions
We present unified framework convergence analysis generalized subgradienttype algorithms presence perturbations One principal novel features analysis perturbations need tend zero limit It established iterates algorithms attracted certain sense stationary set problem depends magnitude perturbations Characterization attraction sets given general nonsmooth nonconvex case The results strengthened convex weakly sharp strongly convex problems Our analysis extends unifies previously known results convergence stability properties gradient subgradient methods including incremental parallel heavy ball modifications fl The first author supported part CNPq grant Research second author supported part International Science Foundation Grant NBY International Science Foundation Russian Goverment Grant NBY Russian Foundation Fundamental Research Grant N Instituto de Matematica Pura e Aplicada Estrada Dona Castorina Jardim Botanico Rio de Janeiro RJ CEP Brazil Email solodovimpabr z Operations Research Department Faculty Computational Mathematics Cybernetics Moscow State University Moscow Russia
Dr McCalleys research partially supported grants National Science Foundation Pacific Gas Electric Company Dr Honavars research partially supported grants National Science Foundation John Deere Foundation This paper appear Proceedings th Annual North American Power Symposium Oct Laramie Wyoming
Coevolutionary learning involves embedding adaptive learning agents fitness environment dynamically responds progress potential solution many technological chicken egg problems heart several recent surprising successes Sims artificial robot Tesauros backgammon player We recently solved two spirals problem difficult neural network benchmark classification problem using genetic programming primitives set Koza Instead using absolute fitness use relative fitness Angeline Pollack based competition coverage data set As population reproduces fitness function driving selection changes subproblem niches opened rather crowded The solutions found method symbiotic structure suggests holding niches open crossover better able discover modular build ing blocks
We show two cooperating robots learn exactly stronglyconnected directed graph n indistinguishable nodes expected time polynomial n We introduce new type homing sequence two robots helps robots recognize certain previouslyseen nodes We present algorithm robots learn graph homing sequence simultaneously actively wandering graph Unlike previous learning results using homing sequences algorithm require teacher provide counterexamples Furthermore algorithm use efficiently additional information available distinguishes nodes We also present algorithm robots learn taking random walks The rate random walk graph converges stationary distribution characterized conductance graph Our randomwalk algorithm learns expected time polynomial n inverse conductance efficient homingsequence algorithm highconductance graphs
We introduce new model learning membership queries queries near boundary target concept may receive incorrect dont care responses In partial compensation assume distribution examples zero probability mass boundary region The motivation behind model reason incorrect dont care response examples extremely rare practice Thus matter learner classifies We present several positive results new model We show learn intersection two halfspaces membership queries near boundary may answered incorrectly Our algorithm extension algorithm Baum learns intersections two homogeneous halfspaces PACwithmembershipqueries model We also describe algorithms learning several subclasses monotone DNF formulas
To understand interspike interval ISI variability displayed visual cortical neurons Softky Koch critical examine dynamics neuronal integration well variability synaptic input current Most previous models focused latter factor We match simple integrateandfire model experimentally measured integrative properties cortical regular spiking cells McCormick et al After setting RC parameters postspike voltage reset set match experimental measurements neuronal gain obtained vitro plots firing frequency vs injected current Examination resulting model leads intuitive picture neuronal integration unifies seemingly contradictory p N arguments hold spiking regular memory last spike becomes negligible spike threshold crossing caused input variance around steady state spiking Poisson In integrateandfire neurons matched cortical cell physiology steady state behavior predominant ISIs highly variable physiological firing rates wide range inhibitory excitatory inputs
This paper describes argumentation system cooperative design applications Web The system provides experts involved procedures means expressing weighing individual arguments preferences order argue selection certain choice It supports defeasible qualitative reasoning presence illstructured information Argumentation performed set discourse acts call variety procedures propagation information corresponding discussion graph The paper also reports integration Case Based Reasoning techniques used resolve current design issues considering previous similar situations specitcation similarity measures various argumentation items aim estimate variations among opinions designers involved cooperative design
This paper describes new efficient algorithms learning deterministic finite automata Our approach primarily distinguished two features adoption averagecase setting model typical labeling finite automaton retaining worstcase model underlying graph automaton along learning model learner provided means experiment machine rather must learn solely observing automatons output behavior random input sequence The main contribution paper presenting first efficient algorithms learning nontrivial classes automata entirely passive learning model We adopt online learning model learner asked predict output next state given next symbol random input sequence goal learner make prediction mistakes possible Assuming learner means resetting target machine fixed start state first present efficient algorithm makes expected polynomial number mistakes model Next show first algorithm used subroutine second algorithm also makes polynomial number mistakes even absence reset Along way prove number combinatorial results randomly labeled automata We also show labeling states bits input sequence need truly random merely semirandom Finally discuss extension results model automata used represent distributions binary strings
This paper presents comparison Genetic ProgrammingGP Simulated Annealing SA Stochastic Iterated Hill Climbing SIHC based suite program discovery problems previously tackled GP All three search algorithms employ hierarchical variable length representation programs brought recent prominence GP paradigm We feel intuitively obvious mutationbased adaptive search handle program discovery yet date GP problem tried SA SIHC also work
Given Markov chain sampling scheme standard empirical estimator make best use data We show construct better estimators We restrict attention nearest neighbor random fields Gibbs samplers deterministic sweep approach applies sampler uses reversible variableatatime updating deterministic sweep The structure transition distribution sampler exploited construct empirical estimators combined standard empirical estimator reduce asymptotic variance The extra computational cost negligible When random field spatially homogeneous symmetrizations estimator lead variance reduction The performance estimators evaluated simulation study Ising model
In order learning improve adaptiveness animals behavior thus direct evolution way Baldwin suggested learning mechanism must incorporate innate evaluation animals actions influence reproductive fitness For example many circumstances damage animal otherwise reduce fitness painful tend avoided We refer mechanism animal evaluates fitness consequences actions motivation system argue system must evolve along behaviors evaluates We describe simulations evolution populations agents instantiating number different architectures generating action learning worlds differing complexity We find cases members populations evolve motivation systems accurate enough direct learning increase fitness actions agents perform Furthermore motivation systems tend incorporate systematic distortions representations worlds inhabit distortions increase adaptiveness behavior generated
The relation orthography phonology language traditionally modelled handcrafted rule sets Machinelearning ML approaches offer means gather knowledge automatically Problems arise training material sparse Generalising sparse data wellknown problem many ML algorithms We present experiments connectionist instancebased decisiontree learning algorithms applied small corpus Scottish Gaelic instancebased learning ibig algorithm yields best generalisation performance algorithms tested perform tolerably well Given availability lexicon even sparse ML valuable efficient tool automatic phonetic transcription written text
High performance compilers increasingly rely accurate modeling machine resources efficiently exploit instruction level parallelism application In paper propose reduced machine description results faster detection resource contentions preserving scheduling constraints present original machine description The proposed approach reduces machine description automated errorfree efficient fashion Moreover fully supports schedulers backtrack process operations arbitrary order Reduced descriptions DEC Alpha MIPS RR Cydra result times faster detection resource contentions require memory storage used original machine descriptions
We study problem estimating log spectrum stationary Gaussian time series thresholding empirical wavelet coefficients We propose use thresholds jn depending sample size n wavelet basis resolution level j At fine resolution levels j propose The purpose thresholding level make reconstructed logspectrum nearly noisefree possible In addition pleasant visual point view noisefree character leads attractive theoretical properties wide range smoothness assumptions Previous proposals set much smaller thresholds enjoy properties jn ff j log n
Data mining algorithms including machine learning statistical analysis pattern recognition techniques greatly improve understanding data warehouses becoming widespread In paper focus classification algorithms review need multiple classification algorithms We describe system called MLC designed help choose appropriate classification algorithm given dataset making easy compare utility different algorithms specific dataset interest MLC provides workbench comparisons also provides library C classes aid development new algorithms especially hybrid algorithms multistrategy algorithms Such algorithms generally hard code scratch We discuss design issues interfaces programs visualization resulting classifiers
Reinforcement learning methods applied control problems objective optimizing value function time They used train single neural networks learn solutions whole tasks Jacobs Jordan shown set expert networks combined via gating network quickly learn tasks decomposed Even decomposition learned Inspired Boyans work modular neural networks learning temporaldifference methods modify reinforcement learning algorithm called QLearning train modular neural network solve control problem The resulting algorithm demonstrated classical polebalancing problem The advantage method makes possible deal complex dynamic control problem effectively using task decomposition competitive learning
This report replicates extends results reported Naval Air Warfare Center NAWC personnel automatic classification sonar images They used novel casebased reasoning systems empirical studies obtain comparative analyses using standard classification algorithms Therefore quality NAWC results unknown We replicated NAWC studies also tested several classifiers ie casebased otherwise machine learning literature These comparisons ramifications detailed paper Next investigated Fala Walkers two suggestions future work ie combining similarity functions alternative case representation Finally describe several ways incorporate additional domainspecific knowledge applying casebased classifiers similar tasks
In casebased reasoning systems case adaptation process traditionally controlled static libraries handcoded adaptation rules This paper proposes method learning adaptation knowledge form adaptation strategies type developed handcoded Kass Adaptation strategies differ standard adaptation rules encode general memory search procedures finding information needed case adaptation paper focuses issues involved learning memory search procedures form basis new adaptation strategies It proposes method starts small library abstract adaptation rules uses introspective reasoning systems memory organization generate memory search plans needed apply rules The search plans packaged original abstract rules form new adaptation strategies future use This process allows CBR system learn domain storing results case adaptation also learn apply cases memory effectively
In Artificial Intelligence Psychology Education growing body research supports view learning goaldirected process Psychological experiments show people different goals process information differently studies education show goals strong effects students learn functional arguments machine learning support necessity goalbased focusing learner effort At Fourteenth Annual Conference Cognitive Science Society symposium brought together researchers AI psychology education discuss goaldriven learning This article presents fundamental points illuminated symposium placing context open questions current research di rections goaldriven learning fl Appears AI Magazine
We present new method inspired bootstrap whose goal determine quality reliability neural network predictor Our method leads robust forecasting along large amount statistical information forecast performance exploit We exhibit method context multivariate time series prediction financial data New York Stock Exchange It turns variation due different resamplings ie splits training crossvalidation test sets significantly larger variation due different network conditions architecture initial weights Furthermore method allows us forecast probability distribution opposed traditional case single value time step We demonstrate strictly heldout test set includes stock market crash We also compare performance class neural networks identically bootstrapped linear models
We present new method obtaining local error bars ie estimates confidence predicted value depend input We approach problem nonlinear regression maximum likelihood framework We demonstrate technique first computer generated data locally varying normally distributed target noise We apply laser data Santa Fe Time Series Competition Finally extend technique estimate error bars iterated predictions apply exact competition task gives best performance date
Pinsker gave precise asymptotic evaluation minimax mean squared error estimation signal Gaussian noise signal known priori lie compact ellipsoid Hilbert space This Minimax Bayes method applied variety global nonparametric estimation settings parameter spaces far ellipsoidal For example leads theory exact asymptotic minimax estimation norm balls Besov Triebel spaces using simple coordinatewise estimators wavelet bases This paper outlines features method common several applications In particular derive new results exact asymptotic minimax risk weak p balls R n n also class local estimators Triebel scale By nature method reveals structure asymptotically least favorable distributions Thus may simulate least favorable sample paths We illustrate estimation signal Gaussian white noise norm balls certain Besov spaces In wavelet bases p lt least favorable priors sparse resulting sample paths strikingly different observed Pinskers ellipsoidal setting p Acknowledgements I grateful many conversations David Donoho Carl Taswell referee helpful comments This work supported part NSF grants DMS NIH PHS grant GM
A proper choice proposal distribution MCMC methods eg MetropolisHastings algorithm well known crucial factor convergence algorithm In paper introduce adaptive Metropolis Algorithm AM Gaussian proposal distribution updated along process using full information cumulated far Due adaptive nature process AM algorithm nonMarkovian establish correct ergodic properties We also include results numerical tests indicate AM algorithm competes well traditional MetropolisHastings algorithms demonstrate AM provides easy use algorithm practical computation Mathematics Subject Classification C U Keywords adaptive MCMC comparison convergence ergodicity Markov Chain
We previously shown regularization principles lead approximation schemes equivalent networks one layer hidden units called Regularization Networks In particular discussed standard smoothness functionals lead subclass regularization networks wellknown Radial Basis Functions approximation schemes In paper show regularization networks encompass much broader range approximation schemes including many popular general additive models neural networks In particular introduce new classes smoothness functionals lead different classes basis functions Additive splines well tensor product splines obtained appropriate classes smoothness functionals Furthermore extension leads Radial Basis Functions RBF Hyper Basis Functions HBF also leads additive models ridge approximation models containing special cases Breimans hinge functions forms Projection Pursuit Regression We propose use term Generalized Regularization Networks broad class approximation schemes follow extension regularization In probabilistic interpretation regularization different classes basis functions correspond different classes prior probabilities approximating function spaces therefore different types smoothness assumptions In final part paper show relation activation functions Gaussian sigmoidal type considering simple case kernel Gx jxj In summary different multilayer networks one hidden layer collectively call Generalized Regularization Networks correspond different classes priors associated smoothness functionals classical regularization principle Three broad classes Radial Basis Functions generalize Hyper Basis Functions b tensor product splines c additive splines generalize schemes type ridge approximation hinge functions onehiddenlayer perceptrons This paper describes research done within Center Biological Computational Learning Department Brain Cognitive Sciences Artificial Intelligence Laboratory This research sponsored grants Office Naval Research contracts NJ NJ grant National Science Foundation contract ASC includes funds DARPA provided HPCC program grant National Institutes Health contract NIH SRR Additional support provided North Atlantic Treaty Organization ATR Audio Visual Perception Research Laboratories Mitsubishi Electric Corporation Sumitomo Metal Industries Siemens AG Support AI Laboratorys artificial intelligence research provided ONR contract NJ Tomaso Poggio supported Uncas Helen Whitaker Chair Whitaker College Massachusetts Institute Technology c fl Massachusetts Institute Technology
In many cases programs lengths increase known bloat fluff increasing structural complexity artificial evolution We show bloat specific genetic programming suggest inherent search techniques discrete variable length representations using simple static evaluation functions We investigate bloating characteristics three nonpopulation one population based search techniques using novel mutation operator An artificial ant following Santa Fe trail problem solved simulated annealing hill climbing strict hill climbing population based search using two variants new subtree based mutation operator As predicted bloat observed using unbiased mutation absent simulated annealing hill climbers using length neutral mutation however bloat occurs mutations using population We conclude two causes bloat search operators length bias tend sample bigger trees competition within populations favours longer programs usually reproduce accurately
We propose probabilistic casespace metric case matching case adaptation tasks Central approach probability propagation algorithm adopted Bayesian reasoning systems allows casebased reasoning system perform theoretically sound probabilistic reasoning The probability propagation mechanism actually offers uniform solution case matching case adaptation problems We also show algorithm implemented connectionist network efficient massively parallel case retrieval inherent property system We argue using kind approach difficult problem case indexing completely avoided Pp Topics CaseBased Reasoning edited Stefan Wess KlausDieter Althoff Michael M Richter Volume Lecture
This paper describes simple means genetic search towards optimal neural network architectures improved convergence speed quality final result This result theoretically explained Baldwin effect implemented learning process network alone also changing network architecture part learning procedure This seen combination two different techniques help ing improving simple genetic search
Technical Report No Revised March University Washington Department Statistics Seattle Washington Abstract During past years several nonparametric alternatives Cox proportional hazards model appeared literature These methods extend techniques well known regression analysis analysis censored survival data In paper discuss methods based partition trees polynomial splines analyze two datasets using Survival Trees HARE compare strengths weaknesses two methods One strengths HARE model fitting procedure implicit check proportionality underlying hazards model It also provides explicit model conditional hazards function makes convenient obtain graphical summaries On hand treebased methods automatically partition dataset groups cases similar survival history Results obtained survival trees HARE often complimentary Trees splines survival analysis provide data analyst two useful tools analyzing survival data
This paper second series two problem estimating function probability distribution finite set samples distribution In first paper Bayes estimator function probability distribution introduced optimal properties Bayes estimator discussed Bayes frequencycounts estimators Shannon entropy derived graphically contrasted In current paper analysis first paper extended derivation Bayes estimators several functions interest statistics information theory These functions powers mutual information chisquared tests independence variance covariance average Finding Bayes estimators several functions requires extensions analytical techniques developed first paper extensions form main body paper This paper extends analysis ways well example enlarging class potential priors beyond uniform prior assumed first paper In particular use entropic Dirichlet priors considered
Many lowerlevel areas mammalian visual system organized retinotopically maps preserve certain degree topography retina A unit part retinotopic map normally responds selectively stimulation welldelimited part visual field referred receptive field RF Receptive fields probably prominent ubiquitous computational mechanism employed biological information processing systems This paper surveys possible computational reasons behind ubiquity RFs discussing examples RFbased solutions problems vision spatial acuity sensory coding object recognition fl Weizmann Institute CSTR appear Vision R J Watt ed MIT Press
We address problem computing largest fraction missing information EM algorithm worst linear function data augmentation These largest eigenvalue associated eigenvector Jacobian EM operator maximum likelihood estimate important assessing convergence iterative simulation An estimate largest fraction missing information available EM iterates often adequate since figures accuracy needed In instances EM iteration also gives estimate worst linear function We show power method eigencomputation used compute efficient accurate estimates quantities Unlike eigenvalue decomposition power method computes largest eigenvalue eigenvector matrix take advantage good eigenvector estimate initial value terminated figures accuracy obtained Moreover matrix products needed power method computed extrapolation obviating need form Jacobian EM operator We give results simultation studies multivariate normal data showing approach becomes efficient data dimension increases methods use finitedifference approximation Jacobian generalpurpose alternative available fl Funded National Institutes Health Small Business Innovation Reseach Grant RCA Office Naval Research contracts N N We indebted Tim Hesterberg Jim Schimert Doug Clarkson Anne Greenbaum Adrian Raftery comments discussion helped advance research improve paper
We describe directed acyclic graphical model contains hierarchy linear units mechanism dynamically selecting appropriate subset units model observation The nonlinear selection mechanism hierarchy binary units gates output one linear units There connections linear units binary units generative model viewed logistic belief net Neal selects skeleton linear model among available linear units We show Gibbs sampling used learn parameters linear binary units even sampling brief Markov chain far equilibrium
We describe simple reduction problem PAClearning multipleinstance examples PAClearning onesided random classification noise Thus concept classes learnable onesided noise includes concepts learnable usual sided random noise model plus others parity function learnable multipleinstance examples We also describe efficient somewhat technically involved reduction StatisticalQuery model results polynomialtime algorithm learning axisparallel rectangles sample complexity Od r saving roughly factor r results Auer et al
The absence powerful control structures processes synchronize coordinate switch choose among regulate direct modulate interactions combine distinct yet interdependent modules large connectionist networks CN probably one important reasons networks yet succeeded handling difficult tasks eg complex object recognition description complex problemsolving planning In paper examine CN built large numbers relatively simple neuronlike units given ability handle problems typical multicomputer networks artificial intelligence programs along types programs always handled using extremely elaborate precisely worked central control coordination synchronization switching etc We point several mechanisms central control unbrainlike sort CN already built albeit hidden often overlooked ways We examine kinds control mechanisms found computers programs fetal development cellular function immune system evolution social organizations especially brains might use CN Particularly intriguing suggestions found pacemakers oscillators local sources brains complex partial synchronies diffuse global effects slow electrical waves neurohormones developmental program guides fetal development communication coordination within among living cells working immune system evolutionary processes operate large populations organisms great variety partially competing partially cooperating controls found small groups organizations larger societies All systems rich control typically control emerges complex interactions many local diffuse sources We explore several different kinds plausible control mechanisms might incorporated CN assess potential benefits respect cost
In drug activity prediction handwritten character recognition features extracted describe training example depend pose location orientation etc example In handwritten character recognition one best techniques addressing problem tangent distance method Simard LeCun Denker Jain et al b introduce new techniquedynamic reposingthat also addresses problem Dynamic reposing iteratively learns neural network reposes examples effort maximize predicted output values New models trained new poses computed models poses converge This paper compares dynamic reposing tangent distance method task predicting biological activity musk compounds In fold crossvalidation
We analyse behaviour Propose Revise architecture VT elevator design problem show problem solving method solve possible cases covered available domain knowledge We investigate problem show limitation caused restricted search regime employed method competence method improved acquiring additional domain knowledge We therefore propose alternative design problem solver integrates casebased reasoning heuristic search techniques overcomes competencerelated limitations exhibited Propose Revise architecture maintaining level efficiency We describe four algorithms casebased design exploit general properties parametric design tasks application specific heuristic knowledge
Genetic algorithms related evolutionary techniques offer promising approach automatically exploring design space neural architectures artificial intelligence cognitive modeling Central process evolutionary design neural architectures EDNA choice representation scheme used encode neural architecture form gene string genotype decode genotype corresponding neural architecture phenotype The representation scheme used constrains class neural architectures representable evolvable system also determines efficiency timespace complexity evolutionary design procedure whole This paper identifies discusses set properties used characterize different representations used EDNA design select representations necessary properties particular classes applications
load balancing even irregular neural networks The idea achieve goals lies programming model CuPit programs objectcentered connections nodes graph neural network objects Algorithms based parallel local computations nodes connections communication along connections plus broadcast reduction operations This report describes design considerations resulting language definition discusses detail tutorial example program
When reasoner explains surprising events internal use key motivation explaining perform learning facilitate achievement goals Human explainers use range strategies build explanations including internal reasoning external information search goalbased considerations profound effect choices pursue explanations However standard AI models explanation rely goalneutral use single fixed strategygenerally backwards chainingto build explanations This paper argues explanation modeled goaldriven learning process gathering transforming information discusses issues involved developing active multistrategy process goaldriven explanation
RFLISSOM selforganizing model laterally connected orientation maps primary visual cortex used study psychological phenomenon known tilt aftereffect The selforganizing processes responsible longterm development map lateral connections shown result tilt aftereffects short time scales adult The model allows observing large numbers neurons connections simultaneously making possible relate higherlevel phenomena lowlevel events difficult experimentally The results give computational support idea direct tilt aftereffects arise adaptive lateral interactions feature detectors long surmised They also suggest indirect effects could result conservation synaptic resources process The model thus provides unified computational explanation selforganization direct indirect tilt aftereffects primary visual cortex
A factor graph bipartite graph expresses global function several variables factors product local functions Factor graphs subsume many graphical models including Bayesian networks Markov random fields Tanner graphs We describe general algorithm computing marginals global function distributed messagepassing corresponding factor graph A wide variety algorithms developed artificial intelligence statistics signal processing digital communications communities derived specific instances general algorithm including Pearls belief propagation belief revision algorithms fast Fourier transform Viterbi algorithm forwardbackward algorithm iterative turbo decoding algorithm
Genetic programming automatic programming technique evolves computer programs solve approximately solve problems This paper presents two examples genetic programming creates computer program controlling robot robot moves specified destination point minimal time In first approach genetic programming evolves computer program composed ordinary r h e c p e r n n
This paper argues task matching casebased reasoning often improved comparing new cases portions precedents An example presented illustrates combining portions multiple precedents permit new cases resolved would indeterminate new cases could compared entire precedents A system uses portions precedents legal analysis domain Texas workers compensation law GREBE described examples GREBEs analysis combine reasoning steps multiple precedents presented
In designing autonomous agents deal competently issues involving time space tradeoff made guaranteed responsetime reactions one hand flexibility expressiveness We propose model action probabilistic reasoning decision analytic evaluation use layered control architecture Our model well suited tasks require reasoning interaction behaviors events fixed temporal horizon Decisions continuously reevaluated problem plans becoming obsolete new information becomes available In paper particularly interested tradeoffs required guarantee fixed reponse time reasoning nondeterministic causeandeffect relationships By exploiting approximate decision making processes able trade accuracy predictions speed decision making order improve expected per formance dynamic situations
We propose examine method approximate dynamic programming Markov decision processes based structured problem representations We assume MDP represented using dynamic Bayesian network construct value functions using decision trees function representation The size representation kept within acceptable limits pruning value trees leaves represent possible ranges values thus approximating value functions produced optimization We propose method detecting convergence prove errors bounds resulting approximately optimal value functions policies describe preliminary experi mental results
The majority commercial computers today register machines von Neumann type We developed method evolve Turingcomplete programs register machine The described implementation enables use program constructs arithmetic operators large indexed memory automatic decomposition subfunctions subroutines ADFs conditional constructs ie ifthenelse jumps loop structures recursion protected functions string list functions Any Cfunction compiled linked function set system The use register machine language allows us work lowest level binary machine code without interpreting steps In von Neumann machine programs data reside memory genetic operators thus directly manipulate binary machine code memory The genetic operators written Clanguage modify individuals binary representation The result execution speed enhancement times compared interpreting Clanguage implementation times compared LISP implementation The use binary machine code demands compact coding one byte per node individual The resulting evolved programs disassembled Cmodules incorporated conventional software development environment The low memory requirements significant speed enhancement technique could use applying genetic programming new application areas platforms research domains
This paper considers importance exploration gameplaying programs learn playing opponents The central question whether learning program play move offers best chance winning present game play move best chance providing useful information future games An approach addressing question developed using probability theory implemented two different learning methods Initial experiments game Go suggest program takes exploration account learn better knowledgeable opponent program
Technical Report Computer Sciences Department University Wisconsin Madison Nov ABSTRACT This article describes approach combining symbolic connectionist approaches machine learning A threestage framework presented research several groups reviewed respect framework The first stage involves insertion symbolic knowledge neural networks second addresses refinement prior knowledge neural representation third concerns extraction refined symbolic knowledge Experimental results open research issues discussed A shorter version paper appear Machine Learning
A distributed neural network model called SPEC processing sentences recursive relative clauses described The model based separating tasks segmenting input word sequence clauses forming caserole representations keeping track recursive embeddings different modules The system needs trained basic sentence constructs generalizes new instances familiar relative clause structures novel structures well SPEC exhibits plausible memory degradation depth center embeddings increases memory primed earlier constituents performance aided semantic constraints constituents The ability process structure largely due central executive network monitors controls execution entire system This way contrast earlier subsymbolic systems parsing modeled controlled highlevel process rather one based automatic reflex responses
In paper propose memorybased Qlearning algorithm called predictive Qrouting PQrouting adaptive traffic control We attempt address two problems encountered Qrouting Boyan Littman namely inability finetune routing policies low network load inability learn new optimal policies decreasing load conditions Unlike memorybased reinforcement learning algorithms memory used keep past experiences increase learning speed PQrouting keeps best experiences learned reuses predicting traffic trend The effectiveness PQrouting verified various network topologies traffic conditions Simulation results show PQrouting superior
Several researchers demonstrated neural networks trained compensate nonlinear signal distortion eg digital satellite communications systems These networks however require original signal distorted version known Therefore trained offline adapt changing channel characteristics In paper novel dual reinforcement learning approach proposed adapt online system performing Assuming channel characteristics directions two predistorters end communication channel coadapt using output predistorter determine reinforcement Using common Volterra Series model simulate channel system shown successfully learn compensate distortions significantly higher might expected actual channel
Most connectionist modeling assumes noisefree inputs This assumption often violated This paper introduces idea clearning simultaneously cleaning data learning underlying structure The cleaning step viewed topdown processing model modifies data learning step viewed bottomup processing data modifies model Clearning used conjunction standard pruning This paper discusses statistical foundation clearning gives interpretation terms mechanical model describes obtain point predictions conditional densities output shows resulting model used discover properties data otherwise accessible signaltonoise ratio inputs This paper uses clearning predict foreign exchange rates noisy time series problem wellknown benchmark performances On outofsample test period clearning obtains annualized return investment significantly better otherwise identical network The final ultrasparse network remaining nonzero inputtohidden weights initial weights inputs hidden units robust overfitting This small network also lends interpretation
Most connectionist modeling assumes noisefree inputs This assumption often violated This paper introduces idea clearning simultaneously cleaning data learning underlying structure The cleaning step viewed topdown processing model modifies data learning step viewed bottomup processing data modifies model Clearning used conjunction standard pruning This paper discusses statistical foundation clearning gives interpretation terms mechanical model describes obtain point predictions conditional densities output shows resulting model used discover properties data otherwise accessible signaltonoise ratio inputs This paper uses clearning predict foreign exchange rates noisy time series problem wellknown benchmark performances On outofsample test period clearning obtains annualized return investment significantly better otherwise identical network The final ultrasparse network remaining nonzero inputtohidden weights initial weights inputs hidden units robust overfitting This small network also lends interpretation
We discuss Bayesian formalism gives rise type wavelet threshold estimation nonparametric regression A prior distribution imposed wavelet coefficients unknown response function designed capture sparseness wavelet expansion common applications For prior specified posterior median yields thresholding procedure Our prior model underlying function adjusted give functions falling specific Besov space We establish relation hyperparameters prior model parameters Besov spaces within realizations prior fall Such relation gives insight meaning Besov space parameters Moreover established relation makes possible principle incorporate prior knowledge functions regularity properties prior model wavelet coefficients However prior knowledge functions regularity properties might hard elicit mind propose standard choise prior hyperparameters works well examples Several simulated examples used illustrate method comparisons made thresholding methods We also present application data set collected anaesthesiological study
The Perfect Phylogeny Problem classical problem computational evolutionary biology set speciestaxa described set qualitative characters In recent years problem shown NPComplete general different fixed parameter versions solved polynomial time In particular Agarwala FernandezBaca developed O r nk k algorithm perfect phylogeny problem n species defined k rstate characters Since commonly character data drawn alignments molecular sequences k length sequences thus large hundreds thousands Thus imperative develop algorithms run efficiently large values k In paper make additional observations structure problem produce algorithm problem runs time O r k n We also show possible efficiently build structure implicitly represents set perfect phylogenies randomly sample set
Belief networks probabilistic networks neural networks two forms network representations used development intelligent systems field artificial intelligence Belief networks provide concise representation general probability distributions set random variables facilitate exact calculation impact evidence propositions interest Neural networks represent parameterized algebraic combinations nonlinear activation functions found widespread use models real neural systems function approximators amenability simple training algorithms Furthermore simple local nature neural network training algorithms provides certain biological plausibility allows massively parallel implementation In paper show similar local learning algorithms derived belief networks learning algorithms operate using information directly available normal inferential processes networks This removes main obstacle preventing belief networks competing neural networks abovementioned tasks The precise local probabilistic interpretation belief networks also allows partially wholly constructed humans allows results learning easily understood allows contribute rational decisionmaking welldefined way
We present new parallel algorithm learning Bayesian inference networks data Our learning algorithm exploits properties MDLbased score metric distributed asynchronous adaptive search technique called nagging Nagging intrinsically fault tolerant dynamic load balancing features scales well We demonstrate viability effectiveness scalability approach empirically several experiments using order machines More specifically show distributed algorithm provide optimal solutions larger problems well good solutions Bayesian networks variables
In article investigate relationship two popular algorithms EM algorithm Gibbs sampler We show approximate rate convergence Gibbs sampler Gaussian approximation equal corresponding EM type algorithm This helps implementing either algorithms improvement strategies one algorithm directly transported In particular running EM algorithm know approximately many iterations needed convergence Gibbs sampler We also obtain result conditions EM algorithm used finding maximum likelihood estimates slower converge corresponding Gibbs sampler Bayesian inference uses proper prior distributions We illustrate results number realistic examples based generalized linear mixed models
Underwater mammal sound classification demonstrated using novel application wavelet timefrequency decomposition feature extraction using BCM unsupervised network Different feature extraction methods different wavelet representations studied The system achieves outstanding classification performance even tested mammal sounds recorded different locations used training The improved results suggest nonlinear feature extraction wavelet representations outperforms different linear choices basis functions
Multiclass learning problems involve finding definition unknown function fx whose range discrete set containing k gt values ie k classes The definition acquired studying large collections training examples form hx fx Existing approaches problem include direct application multiclass algorithms decisiontree algorithms ID CART b application binary concept learning algorithms learn individual binary functions k classes c application binary concept learning algorithms distributed output codes employed Sejnowski Rosenberg NETtalk system This paper compares three approaches new technique BCH errorcorrecting codes employed distributed output representation We show output representations improve performance ID NETtalk task backpropagation isolatedletter speechrecognition task These results demonstrate errorcorrecting output codes provide generalpurpose method improving performance inductive learning programs multiclass problems
In paper give completeness theorem inductive inference rule inverse entailment proposed Muggleton Our main result hypothesis clause H derived example E background theory B inverse entailment iff H subsumes E relative B Plotkins sense The theory B clausal theory example E clause neither tautology implied B The derived hypothesis H clause always definite In order prove result give declarative semantics arbitrary consistent clausal theories show SBresolution originally introduced Plotkin complete procedural semantics The completeness shown extension completeness theorem SLDresolution We also show every hypothesis H derived saturant generalization proposed Rouveirol must subsume E wrt B Buntines sense Moreover show saturant generalization obtained inverse entailment giving restriction usage
We present algorithm arc reversal Bayesian networks treestructured conditional probability tables consider advantages especially simulation dynamic probabilistic networks In particular method allows one produce CPTs nodes involved reversal exploit regularities conditional distributions We argue approach alleviates overhead associated arc reversal plays important role evidence integration used restrict sampling variables DPNs We also provide algorithm detects dynamic irrelevance state variables forward simulation This algorithm exploits structured CPTs reversed network determine timeindependent fashion conditions variable need sampled
A novel approach learning first order logic formulae positive negative examples presented Whereas present inductive logic programming systems employ examples true false ground facts clauses view examples interpretations true false target theory This viewpoint allows reconcile inductive logic programming paradigm classical attribute value learning sense latter special case former Because property able adapt AQ CN type algorithms order enable learning full first order formulae However whereas classical learning techniques concentrated concept representations disjunctive normal form use clausal representation corresponds conjuctive normal form conjunct forms constraint positive examples This representation duality reverses also role positive negative examples heuristics algorithm The resulting theory incorporated system named ICL Inductive Constraint Logic
The multiple instance problem arises tasks training examples ambiguous single example object may many alternative feature vectors instances describe yet one feature vectors may responsible observed classification object This paper describes compares three kinds algorithms learn axisparallel rectangles solve multipleinstance problem Algorithms ignore multiple instance problem perform poorly An algorithm directly confronts multiple instance problem attempting identify feature vectors responsible observed classifications performs best giving correct predictions muskodor prediction task The paper also illustrates use artificial data debug compare algorithms
A new class data structures called bumptrees described These structures useful efficiently implementing number neural network related operations An empirical comparison radial basis functions presented robot arm mapping learning task Applications density estimation classification constraint representation learning also outlined
Model learning combined dynamic programming shown effective learning control continuous state dynamic systems The simplest method assumes learned model correct applies dynamic programming many approximators provide uncertainty estimates fit How exploited This paper addresses case system must prevented catastrophic failures learning We propose new algorithm adapted dual control literature use Bayesian locally weighted regression models stochastic dynamic programming A common reinforcement learning assumption aggressive exploration encouraged This paper addresses converse case system reign exploration The algorithm illustrated dimensional simulated control problem
Handling multiclass problems real numbers important practical applications machine learning KDD problems While attributevalue learners address problems rule ILP systems The ILP systems handle real numbers mostly trying real values applicable thus running efficiency overfitting problems This paper discusses recent extensions ICL address problems ICL stands Inductive Constraint Logic ILP system learns first order logic formulae positive negative examples The main charateristic ICL view examples These seen interpretations true false clausal target theory CNF We first argue ICL used learning theory disjunctive normal form DNF With mind possible solution handling two classes given based ideas CN Finally show tackle problems continuous values adapting discretization techniques attribute value learners
The paper presents learning controller capable increasing insertion speed consecutive pegintohole operations without increasing contact force level Our aim find better relationship measured forces controlled velocity without using complicated human generated model We followed connectionist approach Two learning phases distinguished First learning controller trained initialised supervised way suboptimal task frame controller Then reinforcement learning phase follows The controller consists two networks policy network exploration network Online robotic exploration plays crucial role obtaining better policy Optionally architecture extended third network reinforcement network The learning controller implemented CADbased contact force simulator In contrast related work experiments simulated D degrees freedom Performance pegintohole task measured insertion time averagemaximum force level The fact better performance obtained way demonstrates importance modelfree learning techniques repetitive robotic assembly tasks The paper presents approach simulation results Keywords robotic assembly pegintohole artificial neural networks reinforcement learning
Indirect experiments studies randomized control replaced randomized encouragement subjects encouraged rather forced receive treatment programs The purpose paper bring attention experimental researchers simple mathematical results enable us assess indirect experiments strength causal influences operate among variables interest The results reveal despite laxity encouraging instrument indirect experimentation yield significant sometimes accurate information impact program population whole well particular individuals participated program
In paper problem system identification H investigated case given frequency response data necessarily uniformly spaced grid frequencies A large class robustly convergent identification algorithms derived A particular algorithm examined explicit worst case error bounds H norm derived discretetime continuoustime systems Examples provided illustrate application algorithms
Advances VLSI technology enable chips billion transistors within next decade Unfortunately centralizedresource architectures modern microprocessors illsuited exploit advances Achieving high level parallelism reasonable clock speed requires distributing processor resources trend already visible dualregisterfile architecture Alpha A Raw microprocessor takes extreme position space distributing resources instruction streams register files memory ports ALUs pipelined twodimensional interconnect exposing fully compiler Compilation instructionlevel parallelism ILP distributedresource machines requires spatial instruction scheduling traditional temporal instruction scheduling This paper describes techniques used Raw compiler handle issues Preliminary results SUIFbased compiler sequential programs written C Fortran indicate Raw approach exploiting ILP achieve speedups scalable number processors applications parallelism The Raw architecture attempts provide performance least comparable provided scaling existing architecture achieve orders magnitude improvement performance applications large amount parallelism This paper offers positive results direction
The use previously learned knowledge learning shown reduce number examples required good generalization increase robustness noise examples In reviewing various means using learned knowledge domain guide learning domain two underlying classes discerned Methods use previous knowledge initialize learner initialization bias use previous knowledge constrain learner search bias We show methods fact exploit domain knowledge differently complement This shown presenting combined approach initializes constrains learner This combined approach seen outperform individual methods conditions accurate previously learned domain knowledge available irrelevant features domain representation
We consider recurrent analog neural nets output gate subject Gaussian noise common noise distribution nonzero large set We show many regular languages recognized networks type give precise characterization languages recognized This result implies severe constraints possibilities constructing recurrent analog neural nets robust realistic types analog noise On hand present method constructing feedforward analog neural nets robust regard analog noise type
This paper describes Rapture system revising probabilistic rule bases converts symbolic rules connectionist network trained via connectionist techniques It uses modified version backpropagation refine certainty factors rule base uses IDs informationgain heuristic Quinlan add new rules Work currently way finding improved techniques modifying network architectures include adding hidden units using UPSTART algorithm Frean A case made via comparison fully connected connectionist techniques keeping rule base close original possible adding new input units needed
This paper tackles supervised induction distance examples described Horn clauses constrained clauses In opposition syntaxdriven approaches approach discriminationdriven proceeds defining small set complex discriminant hypotheses These hypotheses serve new concepts used redescribe initial examples Further redescription embedded space natural integers distance examples thus naturally follows
Probability theory represents manipulates uncertainties tell us behave For need utility theory assigns values usefulness different states decision theory concerns optimal rational decisions There many methods probability modeling learning utility decision models We use reinforcement learning find optimal sequence questions diagnosis situation maintaining high accuracy Automated diagnosis heartdisease domain used demonstrate temporaldifference learning improve diagnosis On Cleveland heartdisease database results better reported previous methods
The simple Bayesian classifier SBC sometimes called NaiveBayes built based conditional independence model attribute given class The model previously shown surprisingly robust obvious violations independence assumption yielding accurate classification models even clear conditional dependencies The SBC serve excellent tool initial exploratory data analysis coupled visualizer makes structure comprehensible We describe visual representation SBC model successfully implemented We describe requirements visualization design decisions made satisfy
This paper describes recent work development computer architectures efficient execution artificial neural network algorithms Our earlier system Ring Array Processor RAP multiprocessor based commercial DSPs lowlatency ring interconnection scheme We used RAP simulate variable precision arithmetic guide us design higher performance neurocomputers based custom VLSI The RAP system played critical role study enabling us experiment much larger networks would otherwise possible Our study shows backpropagation training algorithms require moderate precision Specifically b weight values b output values sufficient achieve training classification results comparable b floating point Although results gathered frame classification continuous speech expect extend many connectionist calculations We used results part design programmable single chip microprocessor SPERT The reduced precision arithmetic permits use multiple units per processor Also reduced precision operands make efficient use valuable processormemory bandwidth For moderateprecision fixedpoint arithmetic applications SPERT represents order magnitude reduction cost systems based DSP chips
RK Belew J McInerney N Schraudolph Evolving networks using genetic algorithm connectionist learning Artificial Life II SFI Studies Science Complexity CG Langton C Taylor JD Farmer S Rasmussen Eds vol AddisonWesley M McInerney AP Dhawan Use genetic algorithms back propagation training feedforward neural networks IEEE International Conference Neural Networks vol pp FZ Brill DE Brown WN Martin Fast genetic selection features neural network classifiers IEEE Transactions Neural Networks vol pp F Dellaert J Vandewalle Automatic design cellular neural networks means genetic algorithms finding feature detector The Third IEEE International Workshop Cellular Neural Networks Their Applications IEEE New Jersey pp DE Moriarty R Miikkulainen Efficient reinforcement learning symbiotic evolution Machine Learning vol pp L Davis Handbook Genetic Algorithms Van Nostrand Reinhold New York D Whitely The GENITOR algorithm selective pressure Proceedings Third Interanational Conference Genetic Algorithms JD Schaffer Ed Morgan Kauffman San Mateo CA pp van Camp D T Plate GE Hinton The Xerion Neural Network Simulator Documentation Department Computer Science University Toronto Toronto
Research machine learning design concentrated use development techniques solve simple welldefined problems Invariably effort important early stages development field scale address real design problems since existing techniques based simplifying assumptions hold real design In particular address dependence context multiple often conflicting interests constitutive design This paper analyzes present situation criticizes number prevailing views Subsequently paper offers alternative approach whose goal advance use machine learning design practice The approach partially integrated modeling system called ndim The use machine learning ndim presented open research issues outlined
kj The standard PPR algorithm Friedman Stuetzle estimates smooth functions f j using supersmoother nonparametric scatterplot smoother Friedmans algorithm constructs model M max linear combinations prunes back simpler model size M M max M M max specified user This paper discusses alternative algorithm smooth functions estimated using smoothing splines The direction coefficients ff j amount smoothing direction number terms M M max determined optimize single generalized crossvalidation measure
In paper suggest mechanism improves significantly performance topdown inductive logic programming ILP learning system This improvement achieved cost giving system extra information difficult formulate This information appears form algorithm sketch incomplete somewhat vague representation computation related particular example We describe sketches admissible give details learning algorithm exploits information contained sketch The experiments carried implemented system SKIL demonstrated usefulness method potential future applications
In paper concerned problem inducing recursive Horn clauses small sets training examples The method iterative bootstrap induction presented In first step system generates simple clauses regarded properties required definition Properties represent generalizations positive examples simulating effect larger number examples Properties used subsequently induce required recursive definitions This paper describes method together series experiments The results support thesis iterative bootstrap induction indeed effective technique could general use ILP
This paper aims examine use genetic algorithms optimize subsystems cellular neural network architectures The application hand character recognition aim evolve optimal feature detector order aid conventional classifier network generalize across different fonts To end performance function genetic encoding feature detector presented An experiment described optimal feature detector indeed found genetic algorithm We interested application cellular neural networks computer vision Genetic algorithms GAs serve optimize design cellular neural networks Although design global architecture system could still done human insight propose specific submodules system best optimized using one optimization method GAs good candidate fulfill optimization role well suited problems objective function complex function many parameters The specific problem want investigate one character recognition More specifically would like use GA find optimal feature detectors used recognition digits
This article exposes problems commonly used technique splitting available data training validation test sets held fixed warns drawing strong conclusions static splits shows potential pitfalls ignoring variability across splits Using bootstrap resampling method compare uncertainty solution stemming data splitting neural network specific uncertainties parameter initialization choice number hidden units etc We present two results data New York Stock Exchange First variation due different resamplings significantly larger variation due different network conditions This result implies important overinterpret model ensemble models estimated one specific split data Second split neural network solution early stopping close linear model significant nonlinearities extracted
Validation used detect overfitting starts supervised training neural network training stopped convergence avoid overfitting early stopping The exact criterion used validationbased early stopping however usually chosen adhoc fashion training stopped interactively This trick describes select stopping criterion systematic fashion trick either speeding learning procedures improving generalization whichever important particular situation An empirical investigation multilayer perceptrons shows exists tradeoff training time generalization From given mix training runs using different problems different network architectures I conclude slower stopping criteria allow small improvements generalization average cost much training time factor longer average
We introduce new formal model learning algorithm must combine collection potentially poor statistically independent hypothesis functions order approximate unknown target function arbitrarily well Our motivation includes question make optimal use multiple independent runs mediocre learning algorithm well settings many hypotheses obtained distributed population identical learning agents
Markov chain Monte Carlo MCMC methods make possible use flexible Bayesian models would otherwise computationally infeasible In recent years great variety applications described literature Applied statisticians new methods may several questions concerns however How much effort expertise needed design use Markov chain sampler How much confidence one answers MCMC produces How use MCMC affect rest modelbuilding process At Joint Statistical Meetings August panel experienced MCMC users discussed issues well various tricks trade This paper edited recreation discussion Its purpose offer advice guidance novice users MCMC notsonovice users well Topics include building confidence simulation results methods speeding assessing convergence estimating standard errors identification models good MCMC algorithms exist current state software development
Neural Network Machine Learning Laboratory Computer Science Department Brigham Young University Provo UT USA Email martinezcsbyuedu WWW httpaxoncsbyuedu Abstract Many neural network models must trained finding set realvalued weights yield high accuracy training set Other learning models require weights input attributes yield high leaveoneout classification accuracy order avoid problems associated irrelevant attributes high dimensionality In addition variety general problems set real values must found maximize evaluation function This paper presents algorithm schemata search realvalued weight space find set weights real values yield high values given evaluation function The algorithm called RealValued Schemata Search RVSS uses BRACE statistical technique Moore Lee determine narrow search space This paper details RVSS approach gives initial empirical results
A large body nonparametric statistical literature devoted density estimation Overviews given Silverman Izenman This paper addresses problem univariate density estimation novel way Our approach falls class called projection estimators introduced Cencov The orthonormal basis used basis compactly supported wavelets Daubechies family Kerkyacharian Picard Donoho et al Delyon Juditsky among others applied wavelets density estimation The local nature wavelet functions makes wavelet estimator superior projection estimators use classical orthonormal bases Fourier Hermite etc Instead estimating unknown density directly estimate square root density enables us control positiveness L norm density estimate However approach one needs preestimator density calculate sample wavelet coefficients We describe VISUSTOP datadriven procedure determining maximum number levels wavelet density estimator Coefficients selected levels thresholded make estimator parsimonious
Intermediate higher vision processes require selection subset available sensory information processing Usually selection implemented form spatially circumscribed region visual field socalled focus attention scans visual scene dependent input attentional state subject We present model control focus attention primates based saliency map This mechanism expected model functionality biological vision also essential understanding complex scenes machine vision
Abstract The problem hypothesis testing examined historical Bayesian points view case sampling underlying joint probability distribution hypotheses tested independence dependence underlying distribution Exact results Bayesian method provided Asymptotic Bayesian results historical method quantities compared historical method quantities interpreted terms clearly defined Bayesian quantities The asymptotic Bayesian test relies upon statistic predominantly mutual information Problems hypothesis testing arise ubiquitously situations observed data produced unknown process question asked From process observed data arise Historically hypothesis testing problem approached point view sampling whereby several fixed hypotheses tested given measures test quality found directly likelihood ie amounts sampling likelihood To specific hypothesis set possible parameter vectors parameter vector completely specifying sampling distribution A simple hypothesis hypothesis set contains one parameter vector A composite hypothesis occurs nonempty hypothesis set single parameter vector Generally test procedure chooses true hypothesis gives largest test value although notion procedure specific may refer method choosing hypothesis given test values Since interest quantify quality test level significance generated level probability chosen hypothesis test procedure incorrect hypothesis choice made The significance generated using sampling distribution likelihood For simple hypotheses level significance found using single parameter value hypothesis When test applied case composite hypothesis size test found given supremum probability ranging parameter vectors hypothesis set chosen
This literature review discusses different methods general rubric learning Bayesian networks data includes overlapping work general probabilistic networks Connections drawn statistical neural network uncertainty communities different methodological communities Bayesian description length classical statistics Basic concepts learning Bayesian networks introduced methods reviewed Methods discussed learning parameters probabilistic network learning structure learning hidden variables The presentation avoids formal definitions theorems plentiful literature instead illustrates key concepts simplified examples
Recent work supervised learning shown surprisingly simple Bayesian classifier strong assumptions independence among features called naive Bayes competitive state art classifiers C This fact raises question whether classifier less restrictive assumptions perform even better In paper examine evaluate approaches inducing classifiers data based recent results theory learning Bayesian networks Bayesian networks factored representations probability distributions generalize naive Bayes classifier explicitly represent statements independence Among approaches single method call Tree Augmented Naive Bayes TAN outperforms naive Bayes yet time maintains computational simplicity search involved robustness characteristic naive Bayes We experimentally tested approaches using benchmark problems U C Irvine repository compared C naive Bayes wrapperbased feature selection methods
In recent years flurry works learning probabilistic belief networks Current state art methods shown successful two learning scenarios learning network structure parameters complete data learning parameters fixed network incomplete datathat presence missing values hidden variables However method yet demonstrated effectively learn network structure incomplete data In paper propose new method learning network structure incomplete data This method based extension ExpectationMaximization EM algorithm model selection problems performs search best structure inside EM procedure We prove convergence algorithm adapt learning belief networks We describe learn networks two scenarios data contains missing values presence hidden variables We provide experimental results show effectiveness procedure scenarios
This paper defines class problems involving combinations induction cost optimisation A framework presented systematically describes problems involve construction decision trees rules optimising accuracy well measurement misclassification costs It present new algorithms shows framework used configure greedy algorithms constructing trees rules The framework covers number existing algorithms Moreover framework also used define algorithm configurations new functionalities expressed evaluation functions
We introduce new framework study reasoning The Learning order Reason approach developed views learning integral part inference process suggests learning reasoning studied together The Learning Reason framework combines interfaces world used known learning models reasoning task performance criterion suitable In framework intelligent agent given access favorite learning interface also given grace period interact interface construct representation KB world W The reasoning performance measured period agent presented queries ff query language relevant world answer whether W implies ff The approach meant overcome main computational difficulties traditional treatment reasoning stem separation world Since agent interacts world constructing knowledge representation choose representation useful task hand Moreover make explicit dependence reasoning performance environment agent interacts We show previous results learning theory reasoning fit framework illustrate usefulness Learning Reason approach exhibiting new results possible traditional setting First give Learning Reason algorithms classes propositional languages efficient reasoning algorithms represented traditional formulabased knowledge base Second exhibit Learning Reason algorithm class propositional languages known learnable traditional sense An earlier version paper appears Proceedings National Conference Artificial Intelligence AAAI Roni Khardon supported ARO grant DAALG NSF grant CCR Dan Roth supported NSF grant CCR DARPA AFOSRFJ Author present Addresses Roni Khardon Division Applied Sciences Harvard University Cambridge MA email ronidasharvardedu Dan Roth Department Computer Science University Illinois UrbanaChampaign W Springfield Ave Urbana Illinois email danrcsuiucedu Permission make digital hard copies part work personal classroom use granted without fee provided copies made distributed profit direct commercial advantage copies show notice first page initial screen display along full citation Copyrights components work owned others ACM must honored Abstracting credit permitted To copy otherwise republish post servers redistribute lists use component work works requires prior specific permission andor fee Permissions may requested Publications Dept ACM Inc Broadway New York NY USA fax permissionsacmorg
Many AI problems formalized reduce evaluating probability propositional expression true In paper show problem computationally intractable even surprisingly restricted cases even settle approximation probability We consider various methods used approximate reasoning computing degree belief Bayesian belief networks well reasoning techniques constraint satisfaction knowledge compilation use approximation avoid computational difficulties reduce modelcounting problems propositional domain We prove counting satisfying assignments propositional languages intractable even Horn monotone formulae even size clauses number occurrences variables extremely limited This contrasted case deductive reasoning Horn theories theories binary clauses distinguished existence linear time satisfiability algorithms What even surprising show even approximating number satisfying assignments ie approximating approximate reasoning intractable restricted theories We also identify restricted classes propositional formulae efficient algorithms counting satisfying assignments given fl Preliminary version paper appeared Proceedings th International Joint Conference Artificial Intelligence IJCAI Supported NSF grants CCR CCR DARPA AFOSRFJ
Recently extended Kalman filter EKF based training demonstrated effective neural network training However conjunction pruning methods weight decay optimal brain damage OBD yet studied In paper elucidate method EKF training propose pruning method based results obtained EKF training These combined training pruning method applied time series prediction problem
We describe recent extensions framework automatic generation musicmaking programs We previously used genetic programming techniques produce musicmaking programs satisfy userprovided critical criteria In paper describe new work use connectionist techniques automatically induce musical structure corpus We show resulting neural networks used critics drive genetic programming system We argue framework potentially support induction recapitulation deep structural features music We present initial results produced using neural hybrid symbolicneural critics discuss directions future work
Much effort devoted understanding learning reasoning artificial intelligence However models attempt integrate two complementary processes Rather vast body research machine learning often focusing inductive learning examples quite isolated work reasoning artificial intelligence Though two processes may different much interrelated The ability reason domain knowledge often based rules domain must learned somehow And ability reason often used acquire new knowledge learn This paper introduces Incremental Learning Algorithm ILA attempts combine inductive learning prior knowledge reasoning ILA many important characteristics useful combination including incremental selforganizing learning nonuniform learning inherent nonmonotonicity extensional intensional capabilities low order polynomial complexity The paper describes ILA gives simulation results several applications discusses characteristics detail
This paper appeared Machine Learning Abstract This paper demonstrates nature opposition training affects learning play twoperson perfect information board games It considers different kinds competitive training impact trainer error appropriate metrics posttraining performance measurement ways metrics applied The results suggest teaching program leading repeatedly restricted paths albeit high quality ones overly narrow preparation variations appear realworld experience The results also demonstrate variety introduced training random choice unreliable preparation program directs training may overlook important situations The results argue broad variety training experience play many levels This variety may either inherent game introduced deliberately training Lesson practice training blend expert guidance knowledgebased selfdirected elaboration shown particularly effective learning competition
The negative effect naturally significant complex domain The graph simple domain crosses line earlier complex domain That means learning starts useful weight greater simple domain complex domain As relax optimality requirement g n f c n l w h W macro usage complex domain becomes advantageous The purpose research described paper identify parameters effects deductive learning perform experiments systematically order understand nature effects The goal paper demonstrate methodology performing parametric experimental study deductive learning The example include study two parameters point satisficingoptimizing scale used search carried problem solving time learning time We showed A looks optimal solutions benefit macro learning strategy comes closer bestfirst satisficing search utility macros increases We also demonstrated deductive learners learn offline solving training problems sensitive type search used learning We showed general optimizing search best learning It generates macros increase quality solutions regardless search method used problem solving It also improves efficiency problem solvers require high level optimality The drawback using optimizing search increase learning resources spent We aware fact results described surprising The goal parametric study necessarily find exciting results obtain results sometimes even previously known controlled experimental environment The work described part research plan We currently process extensive experimentation parameters described also others We also intend test validity conclusions reached study repeating tests several commonly known search problems We hope systematic experimentation help research community better understand process deductive learning serve demonstration experimental methodology used machine learning research
We examine number techniques representing actions stochastic effects using Bayesian networks influence diagrams We compare techniques according ease specification size representation required complete specification dynamics particular system paying particular attention role persistence relationships We precisely characterize two components frame problem Bayes nets stochastic actions propose several ways deal problems compare solutions Reiters solution frame problem situation calculus The result set techniques permit ease specification compact representation probabilistic system dynamics comparable size timbre Reiters representation ie explicit frame axioms
Given function f mapping nvariate inputs finite field F F consider task reconstructing list nvariate degree polynomials agree f tiny nonnegligible fraction ffi input space We give randomized algorithm solving task accesses f black box runs time polynomial djF j For special case solve problem def jF j gt In case running time algorithm bounded polynomial n exponential Our algorithm generalizes previously known algorithm due Goldreich Levin solves
Many problems correspond classical control task determining appropriate control action take given sequence observations One standard approach learning control rules called behavior cloning involves watching perfect operator operate plant trying emulate behavior In experimental learning approach contrast learner first guesses initial operationtoaction policy tries If policy performs suboptimally learner modify produce new policy recur This paper discusses relative effectiveness two approaches especially presence perceptual aliasing showing particular experimental learner often learn effectively cloning one
We present connectionist method representing images explicitly addresses hierarchical nature It blends data neuroscience wholeobject viewpoint sensitive cells inferotemporal cortex attentional basisfield modulation V ideas hierarchical descriptions based microfeatures The resulting model makes critical use bottomup topdown pathways analysis synthesis We illustrate model simple example representing information faces
This paper discusses role culture evolution cognitive systems We define culture information transmitted individuals generations nongenetic means Experiments presented use genetic programming systems include special mechanisms cultural transmission information These systems evolve computer programs perform cognitive tasks including mathematical function mapping action selection virtual world The data show presence culturesupporting mechanisms clear beneficial impact evolvability correct programs The implications results may cognitive science briefly discussed
Numerical design optimization algorithms highly sensitive particular formulation optimization problems given The formulation search space objective function constraints generally large impact duration optimization process well quality resulting design Furthermore best formulation vary one application domain another one problem another within given application domain Unfortunately design engineer may know best formulation advance attempting set run design optimization process In order attack problem developed software environment supports interactive formulation testing reformulation design optimization strategies Our system represents optimization strategies terms secondorder dataflow graphs Reformulations strategies implemented transformations dataflow graphs The system permits user interactively generate search space design optimization strategies experimentally evaluate performance test problems order find strategy suitable application domain The system implemented domain independent fashion tested domain racing yacht design
This paper presents basic results ideas dynamic programming relate directly concerns planning AI These form theoretical basis incremental planning methods used integrated architecture Dyna These incremental planning methods based continually updating evaluation function situationaction mapping reactive system Actions generated reactive system thus involve minimal delay incremental planning process guarantees actions evaluation function eventually optimalno matter extensive search required These methods well suited stochastic tasks tasks complete accurate model available For tasks large implement situationaction mapping table supervisedlearning methods must used capabilities remain significant limitation approach
This paper reports project document retrieval industrial setting The objective provide tool helps finding documents related given query answers Frequently Asked Questions databases A CBR approach used develop running prototypical system currently practical evaluation
We investigate query complexity exact learning membership proper equivalence query model We give complete characterization concept classes learnable polynomial number polynomial sized queries model We give applications characterization including results learning natural subclass DNF formulas learning membership queries alone Query complexity previously used prove lower bounds time complexity exact learning We show new relationship query complexity time complexity exact learning If honest class exactly properly learnable polynomial query complexity learnable polynomial time P NP In particular show honest class exactly polynomialquery learnable learnable using oracle p
This paper presents case study evaluating casebased system It describes evaluation Anapron system pronounces names combination rulebased casebased reasoning Three sets experiments run Anapron set exploratory measurements profile systems operation comparison Anapron namepronunciation systems set studies modified various parts system isolate contribution Lessons learned experiments CBR evaluation methodology CBR theory discussed This work may copied reproduced whole part commercial purpose Permission copy whole part without payment fee granted nonprofit educational research purposes provided whole partial copies include following notice copying permission Mitsubishi Electric Research Laboratories Cambridge Massachusetts acknowledgment authors individual contributions work applicable portions copyright notice Copying reproduction republishing purpose shall require license payment fee Mitsubishi Electric Research Laboratories All rights reserved
Consider given value function states Markov decision problem might result applying reinforcement learning algorithm Unless value function equals corresponding optimal value function states discrepancy natural call Bellman residual value function specifies state obtained onestep lookahead along seemingly best action state using given value function evaluate succeeding states This paper derives tight bound far optimal discounted return greedy policy based given value function function maximum norm magnitude Bellman residual A corresponding result also obtained value functions defined stateaction pairs used Qlearning One significant application results problems function approximator used learn value function training approximator based trying minimize Bellman residual across states stateaction pairs When control based use resulting value function result provides link well objectives function approximator training met quality resulting control
To measure quality set vector quantization points means measuring distance random point quantization required Common metrics Hamming Euclidean metrics mathematically simple inappropriate comparing natural signals speech images In paper shown environment functions input space X induces canonical distortion measure CDM X The depiction canonical justified shown optimizing reconstruction error X respect CDM gives rise optimal piecewise constant approximations functions environment The CDM calculated closed form several different function classes An algorithm training neural networks implement CDM presented along en couraging experimental results
Theory revision systems typically use set theorytotheory transformations f k g hillclimb given initial theory new theory whose empirical accuracy given set labeled training instances fc j g local maximum At heart process evaluator compares accuracy current theory KB neighbors f k KBg goal determining neighbor highest accuracy The obvious wrapper evaluator simply evaluates individual neighbor theory KB k k KB instance c j As expensive evaluate single theory single instance great many training instances huge number neighbors approach prohibitively slow We present alternative system employs smarter evaluator quickly computes accuracy transformed theory k KB looking inside KB reasoning effects k transformation We compare performance naive wrapper system realworld theories obtained fielded expert system find runs times faster attaining accuracy This paper also discusses source power Keywords theory revision efficient algorithm hillclimbing system Multiple Submissions We submited related version paper AAAI fl We gratefully acknowledge many helpful comments report George Drastal Chandra Mouleeswaran Geoff Towell
This paper describes wavelet method estimation density hazard rate functions randomly right censored data We adopt nonparametric approach assuming density hazard rate specific parametric form The method based dividing time axis dyadic number intervals counting number events within interval The number events survival function observations separately smoothed time via linear wavelet smoothers hazard rate function estimators obtained taking ratio We prove estimators possess pointwise global mean square consistency obtain best possible asymptotic MISE convergence rate also asymptotically normally distributed We also describe simulation experiments show estimators reasonably reliable practice The method illustrated two real examples The first uses survival time data patients liver metastases colorectal primary tumour without distant metastases The second concerned times unemployment women wavelet estimate flexibility provides new interesting interpretation
Experience plays important role development human expertise One computational model experience affects expertise provided research casebased reasoning examines stored cases encapsulating traces specific prior problemsolving episodes retrieved reapplied facilitate new problemsolving Much progress made methods accessing relevant cases casebased reasoning receiving wide acceptance technology developing intelligent systems cognitive model human reasoning process However one important aspect casebased reasoning remains poorly understood process retrieved cases adapted fit new situations The difficulty encoding effective adaptation rules hand widely recognized serious impediment development fully autonomous casebased reasoning systems Consequently important question casebased reasoning systems might learn improve expertise case adaptation We present framework acquiring expertise using combination general adaptation rules introspective reasoning casebased reasoning case adaptation task
There strong evidence face processing brain localized The double dissociation prosopagnosia face recognition deficit occurring brain damage visual object agnosia difficulty recognizing kinds complex objects indicates face nonface object recognition may served partially independent neural mechanisms In chapter use computational models show face processing specialization apparently underlying prosopagnosia visual object agnosia could attributed relatively simple competitive selection mechanism development devotes neural resources tasks best performing developing infants need perform subordinate classification identification faces early infants low visual acuity birth Inspired de Schonen Mancinis arguments factors like could bias visual system develop specialized face processor Jacobs Kosslyns experiments mixtures experts ME modeling paradigm provide preliminary computational demonstration theory accounts double dissociation face object processing We present two feedforward computational models visual processing In models selection mechanism gating network mediates competition modules attempting classify input stimuli In Model I modules simple unbiased classifiers competition sufficient achieve enough specialization damaging one module impairs models face recognition object recognition damaging module impairs models object recognition face recognition In Model II however bias modules providing one low spatial frequency information high spatial frequency information In case models task subordinate classification faces superordinate classification objects low spatial frequency network shows even stronger specialization faces No combination tasks inputs shows strong specialization We take results support idea something resembling face processing module could arise natural consequence infants developmental environment without innately specified
In paper indicate possible applications ILP similar techniques knowledge discovery field discuss several methods adapting linking ILPsystems relational database systems The proposed methods range pure ILP based techniques originating ILP We show easy advantageous adapt ILPsystems way
Most exact algorithms general pomdps use form dynamic programming piecewiselinear convex representation one value function transformed another We examine variations incremental pruning approach solving problem compare earlier algorithms theoretical empirical perspectives We find incremental pruning presently efficient algorithm solving pomdps
We improve error bounds based VC analysis classes sets similar classifiers We apply new error bounds separating planes artificial neural networks Key words machine learning learning theory generalization VapnikChervonenkis separating planes neural networks
The higherorder structure genes features biological sequences described means formal grammars These grammars used generalpurpose parsers detect assemble structures means syntactic pattern recognition We describe grammar parser eukaryotic proteinencoding genes measures effective current connectionist combinatorial algorithms predicting gene structures sequence database entries Parameters grammar rules optimized several different species mixing experiments performed determine degree species specificity relative importance compositional signalbased syntactic components gene prediction
The double dissociation prosopagnosia face recognition deficit occurring brain damage visual object agnosia difficulty recognizing kinds complex objects indicates face nonface object recognition may served partially independent mechanisms brain Such dissociation could result competitive learning mechanism development devotes neural resources tasks best performing Studies normal adult performance face object recognition tasks seem indicate face recognition primarily configural involving low spatial frequency information present stimulus relatively large distances whereas object recognition primarily featural involving analysis objects parts using local high spatial frequency information In feedforward computational model visual processing two modules compete classify input stimuli one module receives low spatial frequency information receives high spatial frequency information lowfrequency module shows strong specialization face recognition combined face identificationobject classification task The series experiments shows fine discrimination necessary distinguishing members visually homoge neous class faces relies heavily low spatial frequencies present stimulus
We present novel classification regression method combines exploratory projection pursuit unsupervised training projection pursuit regression supervised training yield new family costcomplexity penalty terms Some improved generalization properties demonstrated real world problems
In paper present objective function formulation BCM theory visual cortical plasticity permits us demonstrate connection unsupervised BCM learning procedure various statistical methods particular Projection Pursuit This formulation provides general method stability analysis fixed points theory enables us analyze behavior evolution network various visual rearing conditions It also allows comparison many existing unsupervised methods This model shown successful various applications phoneme D object recognition We thus striking possibly highly significant result biological neuron performing sophisticated statistical procedure
A system automatic face recognition presented It consists several steps Automatic detection eyes mouth followed spatial normalization images The classification normalized images carried hybrid supervised unsupervised Neural Network Two methods reducing overfitting common problem high dimensional classification schemes presented superiority combination demonstrated
Radial Basis Function RBF neural networks offer attractive equation form use modelbased control approximate highly nonlinear plants yet well suited linear adaptive control We show interpreting RBFs mixtures Gaussians allows application many statistical tools including EM algorithm parameter estimation The resulting EMRBF models give uncertainty estimates warn extrapolating beyond region training data available
CABINS framework modeling optimization task illstructured domains In domains neither systems human experts possess exact model guiding optimization And users model optimality subjective situationdependent CABINS optimizes solution iterative revision using casebased reasoning In CABINS task structure analysis adopted creating initial model optimization task Generic vocabularies found analysis specialized case feature descriptions application problems Extensive experimentation job shop scheduling problems shown CABINS operationalize improve model accumulation cases
Five related factors identified enable single compartment HodgkinHuxley model neurons convert random synaptic input irregular spike trains similar seen vivo cortical recordings We suggest cortical neurons may operate narrow parameter regime synaptic intrinsic conductances balanced flect spike timing detailed correlations inputs fl Please send comments tonysalkedu The reference paper Technical Report INC February Institute Neural Computation UCSD San Diego CA
The application genetic algorithms neural network optimization GANN produced active field research This paper proposes classification encoding strategies also gives critical analysis The idea evolving artificial neural networks NN genetic algorithms GA based powerful metaphor evolution human brain This mechanism developed highest form intelligence known scratch The metaphor inspired great deal research activities traced late instance An increasing amount research reports journal papers theses published topic generating continously growing field Researchers devoloped variety different techniques encode neural networks GA increasing complexity This young field driven mostly small independet research groups scarcely cooperate This paper attempt analyse structure already performed work point shortcomings approaches current state development
We propose object recognition scheme based method feature extraction gray level images corresponds recent statistical theory called projection pursuit derived biologically motivated feature extracting neuron To evaluate performance method use set detailed psychophysical D object recognition experiments Bulthoff Edelman
Wavelet shrinkagethe method proposed seminal work Donohoand Johnstone disarmingly simple efficient way denoising data Shrinking wavelet coefficients proposed several optimality criteria The notable asymptotic minimax crossvalidation criteria In paper wavelet shrinkage imposing natural properties Bayesian models data proposed The performance methods tested standard DonohoJohnstone test functions Key Words Phrases Wavelets Discrete Wavelet Transform Thresholding Bayes Model AMS Subject Classification A G
This paper introduces idea clearning simultaneously cleaning data learning underlying structure The cleaning step viewed topdown processing model modifies data learning step viewed bottomup processing data modifies model After discussing statistical foundation proposed method maximum likelihood perspective apply clearning notoriously hard problem benchmark performances well known prediction foreign exchange rates On difficult test period clearning conjunction pruning yields annualized return outofsample significantly better otherwise identical network trained without cleaning The network started inputs hidden units ended nonzero weights inputs hidden units The resulting ultrasparse final architectures obtained clearning pruning immune overfitting even noisy problems since cleaned data allow simpler model Apart competitive performance clearning gives insight data show estimate overall signaltonoise ratio input variable show error estimates pattern used detect remove outliers replace missing corrupted data cleaned values Clearning used nonlinear regression classification problem
A large class machinelearning problems natural language require characterization linguistic context Two characteristic properties problems feature space high dimensionality target concepts depend small subset features space Under conditions multiplicative weightupdate algorithms Winnow shown exceptionally good theoretical properties In work reported present algorithm combining variants Winnow weightedmajority voting apply problem aforementioned class contextsensitive spelling correction This task fixing spelling errors happen result valid words substituting casual causal We evaluate algorithm WinSpell comparing BaySpell statisticsbased method representing state art task We find When run full unpruned set features WinSpell achieves accuracies significantly higher BaySpell able achieve either pruned unpruned condition When compared systems literature WinSpell exhibits highest performance While several aspects WinSpells architecture contribute superiority BaySpell primary factor able learn better linear separator BaySpell learns When run test set drawn different corpus training set drawn WinSpell better able BaySpell adapt using strategy present combines supervised learning training set unsupervised learning noisy test set
Various notions geometric ergodicity Markov chains general state spaces exist In paper review certain relations implications among We apply results collection chains commonly used Markov chain Monte Carlo simulation algorithms socalled hybrid chains We prove certain conditions hybrid chain inherit geometric ergodicity constituent parts Acknowledgements We thank Charlie Geyer number useful comments regarding spectral theory central limit theorems We thank Alison Gibbs Phil Reiss Peter Rosenthal Richard Tweedie helpful discussions We thank referee editor many excellent suggestions
We present polynomialtime algorithm determining whether set species described characters exhibit phylogenetic tree assuming maximum number possible states character fixed This solves open problem posed Kannan Warnow Our result contrasted proof Steel Bodlaender Fellows Warnow phylogeny problem NPcomplete general
When trying forecast future behavior realworld system two key problems nonstationarity process eg regime switching overfitting model particularly serious noisy processes This articles shows gated experts point solutions problems The architecture also called society experts mixture experts consists nonlinear gating network several nonlinear competing experts Each expert learns conditional mean usual expert also adaptive width The gating network learns assign probability expert depends input This article first discusses assumptions underlying architecture derives weight update rules It evaluates performance gated experts comparison single networks well networks two outputs one predicting mean one local error bar This article also investigates ability gated experts discover characterize underlying regimes The results significantly less overfitting compared single nets two reasons subsets potential inputs given experts gating network less curse dimensionality experts learn match variances local noise levels thus learning This article focuses architecture overfitting problem Applications computergenerated toy problem laser data Santa Fe Competition given Mangeas Weigend application realworld problem predicting electricity demand France given Mangeas et al much data support
Given set samples unknown probability distribution study problem constructing good approximative Bayesian network model probability distribution question This task viewed search problem goal find maximal probability network model given data In work make attempt learn arbitrarily complex multiconnected Bayesian network structures since resulting models unsuitable practical purposes due exponential amount time required reasoning task Instead restrict special class simple treestructured Bayesian networks called Bayesian prototype trees polynomial time algorithm Bayesian reasoning exists We show probability given Bayesian prototype tree model evaluated given data evaluation criterion used stochastic simulated annealing algorithm searching model space The simulated annealing algorithm provably finds maximal probability model provided sufficient amount time used
In many applications decision support negotiation planning scheduling etc one needs express requirements partially satisfied In order express requirements propose technique called forwardtracking Intuitively forwardtracking kind dual chronological backtracking program globally fails find solution new execution started program point state forward computation tree This search technique applied constraint logic programming obtaining powerful extension preserves useful properties original scheme We report successful practical application forwardtracking evolutionary training constrained neural networks
Local search algorithms combinatorial search problems frequently encounter sequence states impossible improve value objective function moves regions called plateau moves dominate time spent local search We analyze characterize plateaus three different classes randomly generated Boolean Satisfiability problems We identify several interesting features plateaus impact performance local search algorithms We show local minima tend small occasionally may large We also show local minima escaped without unsatisfying large number clauses systematically searching escape route may computationally expensive local minimum large We show plateaus exits called benches tend much larger minima benches exit states local search use escape We show solutions ie global minima randomly generated problem instances form clusters behave similarly local minima We revisit several enhancements local search algorithms explain performance light results Finally discuss strategies creating next generation local search algorithms
Visual objects perceived parts correctly identified integrated A neural network theory proposed seeks explain human visual system binds together visual properties dispersed space time multiple objects problem known temporal binding problem The proposed theory based upon neural mechanisms construct update object representations interactions serial attentional mechanism location objectbased selection preattentive Gestaltbased grouping mechanisms associative memory structure binds together object identity form spatial information A working model presented provides unified quantitative explanation results psychophysical experiments object review object integration multielement tracking
We consider two methods tracing genetic algorithms The first method based expected values bit products second method expected values Walsh products We treat proportional selection mutation uniform onepoint crossover As applications obtain results stable points fitness schemata
Routing problems important class planning problems Usually many different constraints optimization criteria involved difficult find general methods solving routing problems We propose evolutionary solver planning problems An instance solver tested specific routing problem time constraints The performance evolutionary solver compared biased random solver biased hillclimber solver Results show evolutionary solver performs significantly better two solvers
We investigating possible modes cooperation among homogeneous agents learning capabilities In paper focused agents learn solve problems using Casebased Reasoning CBR present two modes cooperation among Distributed Casebased Reasoning DistCBR Collective Casebased Reasoning ColCBR We illustrate modes application different CBR agents able recommend chromatography techniques protein purification cooperate The approach taken extend Noos representation language used CBR agents Noos knowledge modeling framework designed integrate learning methods based taskmethod decomposition principle The extension present Plural Noos allows communication cooperation among agents implemented Noos means three basic constructs alien references foreign method evaluation mobile methods
Bayesian network inference formulated combinatorial optimization problem concerning computation optimal factoring distribution represented net Since determination optimal factoring computationally hard problem heuristic greedy strategies able find approximations optimal factoring usually adopted In present paper investigate alternative approach based combination genetic algorithms GA casebased reasoning CBR We show use genetic algorithms improve quality computed factoring case static strategy used MPE computation combination GA CBR still provide advantages case dynamic strategies Some preliminary results different kinds nets reported
This paper attempts rigorously determine computation communication requirements connectionist algorithms running distributedmemory machine The strategy involves specifying key connectionist algorithms highlevel objectoriented language extracting running times polynomials analyzing polynomials determine algorithms space time complexity Results presented various implementations backpropagation algorithm
We present new adaptive connectionist planning method By interaction environment world model progressively constructed using backpropagation learning algorithm The planner constructs lookahead plan iteratively using model predict future reinforcements Future reinforcement maximized derive suboptimal plans thus determining good actions directly knowledge model network strategic level This done gradient descent action space
In Proceedings Conference Uncertainty Artificial Intelli gence UAI Seattle WA July Technical Report RB April Abstract Evaluation counterfactual queries eg If A true would C true important fault diagnosis planning determination liability In paper present methods computing probabilities queries using formulation proposed Balke Pearl antecedent query interpreted external action forces proposition A true When prior probability available causal mechanisms governing domain counterfactual probabilities evaluated precisely However causal knowledge specified conditional probabilities observables bounds computed This paper develops techniques evaluating bounds demonstrates use two applications determination treatment efficacy studies subjects may choose treatment determination liability productsafety litigation
The foremost goal superscalar processor design increase performance exploitation instructionlevel parallelism ILP Previous studies shown speculative execution required high instruction per cycle IPC rates nonnumerical applications The general trend toward supporting speculative execution complicated dynamicallyscheduled processors Performance though high IPC rate also depends upon instruction count cycle time Boosting architectural technique supports general speculative execution simpler staticallyscheduled processors Boosting labels speculative instructions control dependence information This labelling eliminates control dependence constraints instruction scheduling still providing full dependence information hardware We incorporated boosting tracebased global scheduling algorithm exploits ILP without adversely affecting instruction count program We use algorithm estimates boosting hardware involved evaluate much speculative execution support really necessary achieve good performance We find staticallyscheduled superscalar processor using minimal implementation boosting easily reach performance much complex dynamicallyscheduled superscalar processor
This research sponsored part National Science Foundation award IRI Wright Laboratory Aeronautical Systems Center Air Force Materiel Command USAF Advanced Research Projects Agency ARPA grant number F The views conclusions contained document author interpreted necessarily representing official policies endorsements either expressed implied NSF Wright Laboratory United States Government
We investigate application classification techniques utility elicitation In decision problem two sets parameters must generally elicited probabilities utilities While prior conditional probabilities model change user user utility models Thus necessary elicit utility model separately new user Elicitation long tedious particularly outcome space large decomposable There two common approaches utility function elicitation The first base determination users utility function solely elicitation qualitative preferences The second makes assumptions form decomposability utility function Here take different approach attempt identify new users utility function based classification relative database previously collected utility functions We identifying clusters utility functions minimize appropriate distance measure Having identified clusters develop classification scheme requires many fewer simpler assessments full utility elicitation robust utility elicitation based solely preferences We tested algorithm small database utility functions prenatal diagnosis domain results quite promising
The standard method training Hidden Markov Models optimizes point estimate model parameters This estimate viewed maximum posterior probability density model parameters may susceptible overfitting contains indication parameter uncertainty Also maximum may unrepresentative posterior probability distribution In paper study method optimize ensemble approximates entire posterior probability distribution The ensemble learning algorithm requires The traditional training algorithm hidden Markov models expectationmaximization EM algorithm Dempster et al known BaumWelch algorithm It maximum likelihood method simple modification penalized maximum likelihood method viewed maximizing posterior probability density model parameters Recently Hinton van Camp developed technique known ensemble learning see also MacKay review Whereas maximum posteriori methods optimize point estimate parameters ensemble learning ensemble optimized approximates entire posterior probability distribution parameters The objective function optimized variational free energy Feynman measures relative entropy approximating ensemble true distribution In paper derive test ensemble learning algorithm hidden Markov models building Neal resources traditional BaumWelch algorithm
A quantitative model provided psychophysical data tracking multiple visual elements multielement tracking The model employs objectbased attentional mechanism constructing updating object representations The model selectively enhances neural activations serially construct update internal representations objects correlationbased changes synaptic weights The correspondence problem items memory elements visual input resolved combination topdown prediction signals bottomup grouping processes Simulations model image sequences used multielement tracking experiments show reported results consistent serial tracking mechanism based psychophysical neurobiological findings In addition simulations show observed effects perceptual grouping tracking accuracy may result interactions attentionguided predictions object location motion grouping processes involved solving motion correspondence problem
This paper considers design analysis adaptive wavelet control algorithms uncertain nonlinear dynamical systems The Lyapunov synthesis approach used develop statefeedback adaptive control scheme based nonlinearly parametrized wavelet network models Semiglobal stability results obtained key assumption system uncertainty satisfies matching condition The localization properties adaptive networks discussed formal definitions interference localization measures proposed
Temporal difference TD methods constitute class methods learning predictions multistep prediction problems parameterized recency factor Currently important application methods temporal credit assignment reinforcement learning Well known reinforcement learning algorithms AHC Qlearning may viewed instances TD learning This paper examines issues efficient general implementation TD arbitrary use reinforcement learning algorithms optimizing discounted sum rewards The traditional approach based eligibility traces argued suffer inefficiency lack generality The TTD Truncated Temporal Differences procedure proposed alternative indeed approximates TD requires little computation per action used arbitrary function representation methods The idea derived fairly simple new probably unexplored far Encouraging experimental results presented suggesting using gt TTD procedure allows one obtain significant learning speedup essentially cost usual TD learning
This paper presents methodological point view first results interdisciplinary project scientific data mining We analyze data carcinogenicity chemicals derived carcinogenesis bioassay program longterm research study performed US National Institute Environmental Health Sciences The database contains detailed descriptions tests performed compounds animals different species strains sexes The chemical structures described atom bond level terms various relevant structural properties The goal paper investigate effects various levels detail amounts information resulting hypotheses quantitatively qualitatively We apply relational propositional machine learning algorithms learning problems formulated regression classification tasks In addition experiments conducted two learning problems different levels detail Quantitatively experiments indicate additional information necessarily improves accuracy Qualitatively number potential discoveries made algorithm Relational Regression forced abstract details contained relations database
Neural networks Bayesian inference provide useful framework within solve regression problems However parameterization means Bayesian analysis neural networks difficult In paper investigate method regression using Gaussian process priors allows exact Bayesian analysis using matrix manipulations We discuss workings method detail We also detail range mathematical numerical techniques useful applying Gaussian processes general problems including efficient approximate matrix inversion methods developed Skilling
Prototypes proposed representation concepts used effectively humans Developing computational schemes generating prototypes examples however proved difficult problem We present novel genetic algorithm based prototype learning system PLEASE constructing appropriate prototypes classified training instances After constructing set prototypes possible classes class new input instance determined nearest prototype instance Attributes assumed ordinal nature prototypes represented sets featurevalue pairs A genetic algorithm used evolve number prototypes per class positions input space We present experimental results series artificial problems varying complexity PLEASE performs competitively several nearest neighbor classification algorithms problem set An analysis strengths weaknesses initial version system motivates need additional operators The inclusion operators substantially improves performance system particularly difficult problems
In paper problem asymptotic identification fading memory systems presence bounded noise studied For experiment worstcase error characterized terms diameter worstcase uncertainty set Optimal inputs minimize radius uncertainty studied characterized Finally convergent algorithm require knowledge noise upper bound furnished The algorithm based interpolating data spline functions shown well suited identification presence bounded noise basis functions polynomials
This paper describes Rapture system revising probabilistic knowledge bases combines connectionist symbolic learning methods Rapture uses modified version backpropagation refine certainty factors probabilistic rule base uses IDs informationgain heuristic add new rules Results refining three actual expert knowledge bases demonstrate combined approach generally performs better previous methods
In open world applications number machinelearning techniques may potentially apply given learning situation The research presented illustrates complexity involved automatically choosing appropriate technique multistrategy learning system It also constitutes step toward general computational solution learningstrategy selection problem The approach treat learningstrategy selection separate planning problem set goals case ordinary problemsolvers Therefore management pursuit learning goals becomes central issue learning similar goalmanagement problems associated traditional planning systems This paper explores issues problems possible solutions framework Examples presented multistrategy learning system called MetaAQUA
We present empirical evidence considering volatility Eurodollar futures stochastic process requiring generalization standard BlackScholes BS model treats volatility constant We use previous development statistical mechanics financial markets SMFM model issues
We examine new approach modeling uncertainty based plausibility measures plausibility measure associates event plausibility element partially ordered set This approach easily seen generalize approaches modeling uncertainty probability measures belief functions possibility measures The lack structure plausibility measure makes easy us add structure needed basis letting us examine required ensure plausibility measure certain properties interest This gives us insight essential features properties question allowing us prove general results apply many approaches reasoning uncertainty Plausibility measures already proved useful analyzing default reasoning In paper examine algebraic properties analogues use fi probability theory An understanding properties essential plausibility measures used practice representation tool
In paper describe number intelligent data analysis techniques preprocess analyze data coming home monitoring diabetic patients In particular show combination temporal abstractions statistical probabilistic techniques may applied derive useful summaries patients behaviour certain monitoring period Finally describe Intelligent Data Analysis methods may used index past cases perform casebased trieval database past cases
Multipleinstance learning variation supervised learning task learn concept given positive negative bags instances Each bag may contain many instances bag labeled positive even one instances falls within concept A bag labeled negative instances negative We describe new general framework called Diverse Density solving multipleinstance learning problems We apply framework learn simple description person series images bags containing person stock selection problem drug activity prediction problem
Restrictions number depth existential variables defined language series Clint Rae widely used ILP expected produce considerable reduction size hypothesis space In paper show generally case The lower bounds present lead intractable hypothesis spaces except toy domains We argue parameters chosen Clint unsuitable sensible bias shift operations propose alternative approaches resulting desired reduction hypothesis space allowing natural integration shift bias
This paper discussion relationship learning forgetting An analysis economics learning carried argued knowledge sometimes negative value A series experiments involving program learns traverse state spaces described It shown knowledge acquired negative value even though correct acquired solving similar problems It shown value knowledge depends else known random forgetting sometimes lead substantial improvements performance It concluded research knowledge acquisition take seriously possibility knowledge may sometimes harmful The view taken learning forgetting complementary processes construct maintain useful representations experience
We apply general technique learning overcomplete bases problem finding efficient image codes The bases learned algorithm localized oriented bandpass consistent earlier results obtained using related methods We show learned bases Gaborlike structure higher degrees overcompleteness produce greater sampling density position orientation scale The efficient coding framework provides method comparing different bases objectively calculating probability given observed data measuring entropy basis function coefficients Compared complete overcomplete Fourier wavelet bases learned bases much better coding efficiency We demonstrate improvement representation learned bases showing superior performance image denoising fillingin missing pixels
In paper problem Kolmogorov complexity related binary strings faced We propose Genetic Programming approach consists evolving population Lisp programs looking optimal program generates given string This evolutionary approach permited overcome intractable space time difficulties occurring methods perform approximation Kolmogorov complexity function The experimental results quite significant also show interesting computational strategies proving effectiveness implemented technique
Most inductive inference algorithms ie learners work effectively training data contain completely specified labeled samples In many diagnostic tasks however data include values attributes model blocking process hides values attributes learner While blockers remove values critical attributes handicap learner paper instead focuses blockers remove superfluous attribute values ie values needed classify instance given values unblocked attributes We first motivate formalize model superfluousvalue blocking demonstrate omissions useful showing certain classes seem hard learn general PAC model viz decision trees trivial learn setting even learned manner robust classification noise We also discuss model extended deal theory revision ie modifying existing decision tree complex attributes correspond combinations atomic attributes blockers occasionally include superfluous values exclude required values hypothesis classes eg DNF formulae Declaration This paper already accepted currently review journal another conference submitted IJCAIs review period fl This extended version paper appeared working notes AAAI Fall Symposium Relevance New Orleans November Authors listed alphabetically We gratefully acknowledge receiving helpful comments Dale Schuurmans George Drastal
We propose casebased method selecting behavior sets addition traditional reactive robotic control systems The new system ACBARR A Case BAsed Reactive Robotic system provides flexible performance novel environments well overcoming standard hard problem reactive systems box canyon Additionally ACBARR designed manner intended remain close pure reactive control possible Higher level reasoning memory functions intentionally kept minimum As result new reasoning significantly slow system pure reactive speeds
When using machine learning techniques knowledge discovery output comprehensible human important predictive accuracy We introduce new algorithm SETGen improves comprehensibility decision trees grown standard C without reducing accuracy It using genetic search select set input features C allowed use build tree We test SETGen wide variety realworld datasets show SETGen trees significantly smaller reference significantly fewer features trees grown C without using SETGen Statistical significance tests show accuracies SETGens trees either distinguishable accurate original C trees ten datasets tested
We present method automatically determining structure connection weights Boltzmann machine corresponding given Bayesian network representation probability distribution set discrete variables The resulting Boltzmann machine structure implemented efficiently massively parallel hardware since structure divided two separate clusters nodes one cluster updated simultaneously The updating process Boltzmann machine approximates Gibbs sampling process original Bayesian network sense Boltzmann machine converges final state Gibbs sampler The mapping Bayesian network Boltzmann machine seen method incorporating probabilistic priori information neural network architecture trained existing learning algorithms
Dori D Tarsi M A Simple Algorithm Construct Consistent Extension Partially Oriented Graph Computer Science Department TelAviv University Also Technical Report R UCLA Cognitive Systems Laboratory October Pearl J Wermuth N When Can Association Graphs Admit Causal Interpretation UCLA Cognitive Systems Laboratory Technical Report RL November Verma TS Pearl J Deciding Morality Graphs NPcomplete Technical Report R UCLA Cognitive Systems Laboratory October
A default theory sanction different mutually incompatible answers certain queries We identify theory set related credulous theories produces single response query imposing total ordering defaults Our goal identify credulous theory optimal expected accuracy averaged natural distribution queries domain There two obvious complications First expected accuracy theory depends query distribution usually known Second task identifying optimal theory even given distribution information intractable This paper presents method OptAcc sidesteps problems using set samples estimate unknown distribution hillclimbing local optimum In particular given error confidence parameters ffi gt OptAcc produces theory whose expected accuracy probability least
Given problem casebased reasoning CBR system search case memory use stored cases find solution possibly modifying retrieved cases adapt required input specifications In discrete domains CBR reasoning based rigorous Bayesian probability propagation algorithm Such Bayesian CBR system implemented probabilistic feedforward neural network one layers representing cases In paper introduce Minimum Description Length MDL based learning algorithm obtain proper network structure associated conditional probabilities This algorithm together resulting neural network implementation provide massively parallel architecture solving efficiency bottleneck casebased reasoning
Working Paper IS Leonard N Stern School Business New York University In Decision Technologies Financial Engineering Proceedings Fourth International Conference Neural Networks Capital Markets NNCM pp Edited ASWeigend YSAbuMostafa APNRefenes Singapore World Scientific httpwwwsternnyueduaweigendResearchPapersSharpeRatio While many trading strategies based price prediction traders financial markets typically interested riskadjusted performance Sharpe Ratio rather price predictions This paper introduces approach generates nonlinear strategy explicitly maximizes Sharpe Ratio It expressed neural network model whose output position size risky riskfree asset The iterative parameter update rules derived compared alternative approaches The resulting trading strategy evaluated analyzed computergenerated data real world data DAX daily German equity index Trading based Sharpe Ratio maximization compares favorably profit optimization probability matching crossentropy optimization The results show goal optimizing outofsample riskadjusted profit achieved nonlinear approach
Randomized adaptive greedy search using evolutionary algorithms offers powerful versatile approach automated design neural network architectures variety tasks artificial intelligence robotics In paper present results evolutionary design neurocontroller robotic bulldozer This robot given task clearing arena littered boxes pushing boxes sides Through careful analysis evolved networks show evolution exploits design constraints properties environment produce network structures high fitness We conclude brief summary related ongoing research examining intricate interplay environment evolutionary processes determining structure function resulting neural architectures
In paper present framework definition similarity measures using latticevalued functions We show strengths particularly combining similarity measures Then investigate particular instantiation framework sets used represent objects denote degrees similarity The paper con cludes suggesting generalisations findings
We investigate solution constraintbased configuration problems preference function outcomes unknown incompletely specified The aim configure system personal computer optimal given user The goal project develop algorithms generate preferred feasible configuration posing preference queries user In order minimize number complexity preference queries posed user algorithm reasons users preferences taking account constraints set feasible configurations We assume user structure preferences particular way natural many settings exploited optimization process We also address preliminary fashion tradeoffs computational effort solution problem degree interaction user
Selfselection input examples basis performance failure powerful bias learning systems The definition constitutes learning bias however typically restricted bias provided input language hypothesis language preference criteria competing concept hypotheses But bias taken broader context basis provides preference one concept change another paradigm failuredriven processing indeed provides bias Bias exhibited selection examples input stream examples failure successful performance filtered We show degrees freedom less failuredriven learning successdriven learning learning facilitated constraint We also broaden definition failure provide novel taxonomy failure causes illustrate interaction multistrategy learning system called MetaAQUA
We previously introduced Gamma MLP defined MLP usual synaptic weights replaced gamma filters associated gain terms throughout layers In paper apply Gamma MLP larger scale speech phoneme recognition problem analyze operation network investigate Gamma MLP perform better alternatives The Gamma MLP capable employing multiple temporal resolutions temporal resolution defined per de Vries Principe number parameters freedom ie number tap variables per unit time gamma memory equal gamma memory parameter detailed paper Multiple temporal resolutions may advantageous certain problems eg different resolutions may optimal extracting different features input data For problem paper Gamma MLP observed use large range temporal resolutions In comparison TDNN networks typically use single temporal resolution Further motivation Gamma MLP related curse dimensionality ability Gamma MLP trade temporal resolution memory depth therefore increase memory depth without increasing dimensionality network The IIR MLP general version Gamma MLP however IIR MLP performs poorly problem paper Investigation suggests error surface Gamma MLP suitable gradient descent training error surface IIR MLP
We present fast algorithm nonlinear dimension reduction The algorithm builds local linear model data merging PCA clustering based new distortion measure Experiments speech image data indicate local linear algorithm produces encodings lower distortion built five layer autoassociative networks The local linear algorithm also order magnitude faster train
Wavelets wide potential use statistical contexts The basics discrete wavelet transform reviewed using filter notation useful subsequently paper A stationary wavelet transform coefficient sequences decimated stage described Two different approaches construction inverse stationary wavelet transform set The application stationary wavelet transform exploratory statistical method discussed together potential use nonparametric regression A method local spectral density estimation developed This involves extensions wavelet context standard time series ideas periodogram spectrum The technique illustrated application data sets astronomy veterinary anatomy
Business users analysts commonly use spreadsheets D plots analyze understand data Online Analytical Processing OLAP provides users added flexibility pivoting data around different attributes drilling multidimensional cube aggregations Machine learning researchers however concentrated hypothesis spaces foreign users hyperplanes Perceptrons neural networks Bayesian networks decision trees nearest neighbors etc In paper advocate use decision table classifiers easy lineofbusiness users understand We describe several variants algorithms learning decision tables compare performance describe visualization mechanism implemented MineSet The performance decision tables comparable known algorithms CC yet resulting classifiers use fewer attributes comprehensible
The VA management services department invests considerably collection assessment data inform hospital carearea specific levels quality care Resulting time series quality monitors provide information relevant evaluating patterns variability hospitalspecific quality care time across care areas compare assess differences across hospitals In collaboration VA management services group developed various models evaluating patterns dependencies combining data across VA hospital system This paper provides brief overview resulting models summary examples three monitor time series discussion data modelling inference issues This work introduces new models multivariate nonGaussian time series The framework combines crosssectional hierarchical models population hospitals time series structure allow measure timevariations associated hierarchical model parameters In VA study withinyear components models describe patterns heterogeneity across population hospitals relationships among several monitors time series components describe patterns variability time hospitalspecific effects relationships across quality monitors Additional model components isolate unpredictable aspects variability quality monitor outcomes hospital care areas We discuss model assessment residual analysis MCMC algorithms developed fit models interest related applications socioeconomic areas
We report development highperformance system neural network signal processing applications We designed implemented vector microprocessor packaged attached processor conventional workstation We present performance comparisons commercial workstations neural network backpropagation training The SPERTII system demonstrates significant speedups extensively hand optimization code running workstations
A pure rulebased program return set answers query return answer set even rules reordered However impure program includes Prolog cut operators return different answers rules reordered There also many reasoning systems return first answer found query first answers depend rule order even pure rulebased systems A theory revision algorithm seeking revised rulebase whose expected accuracy distribution queries optimal therefore consider modifying order rules This paper first shows polynomial number training labeled queries query coupled correct answer provides distribution information necessary identify optimal ordering It proves however task determining ordering optimal given information intractable even trivial situations eg even query atomic literal seeking perfect theory rule base propositional We also prove task even approximable Unless P N P polynomial time algorithm produce ordering nrule theory whose accuracy within n fl optimal fl gt We also prove similar hardness nonapproximatability results related tasks determining impure contexts optimal ordering antecedents optimal set rules add delete optimal priority values set defaults
Identifying open research issues field necessary step progress field This paper describes four open research problems computational models precedentbased legal reasoning relating case representation precedent use modeling selection construction arguments based pairwise case comparison multipleprecedent arguments modeling process whereby purposes policies principles used case similarity assessment extending applicability precedents tasks classification
Financial forecasting example signal processing problem challenging due small sample sizes high noise nonstationarity nonlinearity Neural networks successful number signal processing applications We discuss fundamental limitations inherent difficulties using neural networks processing high noise small sample size signals We introduce new intelligent signal processing method addresses difficulties The method uses conversion symbolic representation selforganizing map grammatical inference recurrent neural networks We apply method prediction daily foreign exchange rates addressing difficulties nonstationarity overfitting unequal priori class probabilities find significant predictability comprehensive experiments covering different foreign exchange rates The method correctly predicts direction change next day error rate The error rate reduces around rejecting examples system low confidence prediction The symbolic representation aids extraction symbolic knowledge recurrent neural networks form deterministic finite state automata These automata explain operation system often relatively simple Rules related well known behavior trend following mean reversal extracted
COINS Technical Report February Abstract The problem learn examples studied throughout history machine learning many successful learning algorithms developed A problem received less attention select algorithm use given learning task The ability chosen algorithm induce good generalization depends appropriate model class underlying algorithm given task We define algorithms model class representation language uses express generalization examples Supervised learning algorithms differ underlying model class search good generalization Given characterization surprising algorithms find better generalizations tasks Therefore order find best generalization task automated learning system must search appropriate model class addition searching best generalization within chosen class This thesis proposal investigates issues involved automating selection appropriate model class The presented approach two facets Firstly approach combines different model classes form model combination decision tree allows best representation found subconcept learning task Secondly model class appropriate determined dynamically using set heuristic rules Explicit rule conditions particular model class appropriate done next In addition describing approach proposal describes approach evaluated order demonstrate efficient effective method automatic model selection
The development nervous system involves many cases interactions local scale rather execution fully specified genetic blueprint The problem discover nature interactions factors depend The withdrawal polyinnervation developing muscle example competitive interactions play important role We examine possible types competition formal
This paper presents new approach inductive learning combines aspects instancebased learning rule induction single simple algorithm The RISE system searches rules specifictogeneral fashion starting one rule per training example avoids difficulties separateandconquer approaches evaluating proposed induction step globally ie efficient procedure equivalent checking accuracy rule set whole every training example Classification performed using bestmatch strategy reduces nearestneighbor generalizations instances rejected An extensive empirical study shows RISE consistently achieves higher accuracies stateoftheart representatives parent paradigms PEBLS CN also outperforms decisiontree learner C test domains
Most research machine learning focused scenarios learner faces single isolated learning task The lifelong learning framework assumes instead learner encounters multitude related learning tasks lifetime providing opportunity transfer knowledge This paper studies lifelong learning context binary classification It presents invariance approach knowledge transferred via learned model invariances domain Results learning recognize objects color images demonstrate superior generalization capabilities invariances learned used bias subsequent learning This research sponsored part National Science Foundation award IRI Wright Laboratory Aeronautical Systems Center Air Force Materiel Command USAF Advanced Research Projects Agency ARPA grant number F Views conclusions contained document authors interpreted necessarily representing official policies endorsements either expressed implied NSF Wright Laboratory United States Government
Previous bias shift approaches predicate invention applicable learning positive examples complete hypothesis found given language negative examples required determine whether new predicates invented One approach problem presented MERLIN successor system predicate invention guided sequences input clauses SLDrefutations positive negative examples wrt overly general theory In contrast predecessor searches minimal finitestate automaton generate positive negative sequences MERLIN uses technique inducing Hidden Markov Models positive sequences This enables system invent new predicates without triggered negative examples Another advantage using induction technique allows incremental learning Experimental results presented comparing MERLIN positive learning framework Progol comparing original induction technique new version produces deterministic Hidden Markov Models The results show predicate invention may indeed necessary possible learning positive examples well beneficial keep induced model deterministic
We present algorithms learn certain classes functionfree recursive logic programs polynomial time equivalence queries In particular show single kary recursive constantdepth determinate clause learnable Twoclause programs consisting one learnable recursive clause one constantdepth determinate nonrecursive clause also learnable additional basecase oracle assumed These results immediately imply paclearnability classes Although classes learnable recursive programs constrained shown companion paper maximally general generalizing either class natural way leads computationally difficult learning problem Thus taken together companion paper paper establishes boundary efficient learnability recursive logic programs
We present evaluate two methods improving performance ILP systems One discretization numerical attributes based Fayyad Iranis text adapted extended way cope aspects discretization occur relational learning problems indeterminate literals occur The second technique lookahead It wellknown problem ILP learner always assess quality refinement without knowing refinements enabled afterwards ie without looking ahead refinement lattice We present simple method specifying lookahead used kind lookahead interesting Both discretization lookahead techniques evaluated experimentally The results show techniques improve quality induced theory computational costs acceptable
This paper analyses recently suggested particle approach filtering time series We suggest algorithm robust outliers two reasons design simulators use discrete support represent sequentially updating prior distribution Both problems tackled paper We believe largely solved first problem reduced order magnitude second In addition introduce idea stratification particle filter allows us perform online Bayesian calculations parameters index models maximum likelihood estimation The new methods illustrated using stochastic volatility model time series model angles
In paper suggest determinations representation knowledge easy understand We briefly review determinations displayed tabular format use prediction involves simple matching process We describe ConDet algorithm uses feature selection construct determinations training data augmented condensation process collapses rows produce simpler structures We report experiments show condensation reduces complexity loss accuracy discuss ConDets relation work outline directions future studies
Recurrent neural networks complex parametric dynamic systems exhibit wide range different behavior We consider task grammatical inference recurrent neural networks Specifically consider task classifying natural language sentences grammatical ungrammatical recurrent neural network made exhibit kind discriminatory power provided Principles Parameters linguistic framework Government Binding theory We attempt train network without bifurcation learned vs innate components assumed Chomsky produce judgments native speakers sharply grammaticalungrammatical data We consider recurrent neural network could possess linguistic capability investigate properties Elman Narendra Parthasarathy NP Williams Zipser WZ recurrent networks FrasconiGoriSoda FGS locally recurrent networks setting We show
Working Paper IS Leonard N Stern School Business New York University In Journal Computational Intelligence Finance Special Issue Improving Generalization Nonlinear Financial Forecasting Models httpwwwsternnyueduaweigendResearchPapersInteractionLayer Abstract Predictive models financial data often based large number plausible inputs potentially nonlinearly combined yield conditional expectation target daily return asset This paper introduces new architecture task On output side predict dynamical variables first derivatives curvatures different time spans These subsequently combined interaction output layer form several estimates variable interest Those estimates averaged yield final prediction Independently idea input side propose new internal preprocessing layer connected diagonal matrix positive weights layer squashing functions These weights adapt input individually learn squash outliers input We apply two ideas real world example daily predictions German stock index DAX Deutscher Aktien Index compare results network single output The new six layer architecture stable training due two facts More information flowing back outputs input backward pass The constraint predicting first second derivatives focuses learning relevant variables dynamics The architectures compared training perspective squared errors robust errors trading perspective annualized returns percent correct Sharpe ratio
We propose new method estimation linear models The lasso minimizes residual sum squares subject sum absolute value coefficients less constant Because nature constraint tends produce coefficients exactly zero hence gives interpretable models Our simulation studies suggest lasso enjoys favourable properties subset selection ridge regression It produces interpretable models like subset selection exhibits stability ridge regression There also interesting relationship recent work adaptive function estimation Donoho Johnstone The lasso idea quite general applied variety statistical models extensions generalized regression models treebased models briefly described
Instancebased learning techniques typically handle continuous linear input values well often handle nominal input attributes appropriately The Value Difference Metric VDM designed find reasonable distance values nominal attribute values largely ignores continuous attributes requiring discretization map continuous values nominal values This paper proposes three new heterogeneous distance functions called Heterogeneous Value Difference Metric HVDM Interpolated Value Difference Metric IVDM Windowed Value Difference Metric WVDM These new distance functions designed handle applications nominal attributes continuous attributes In experiments applications new distance metrics achieve higher classification accuracy average three previous distance functions datasets nominal continuous attributes
Research utility noncoding segments introns geneticbased encodings shown expedite evolution solutions domains protecting building blocks destructive crossover We consider genetic programming system noncoding segments removed resultant chromosomes returned population This parsimonious repair leads premature convergence since remove naturally occurring noncoding segments strip away protective backup feature We duplicate coding segments repaired chromosomes place modified chromosomes population The duplication method significantly improves learning rate domain considered We also show method applied domains
This paper introduces new operation restricted iteration creation automatically Genetic programming extends Hollands genetic algorithm task automatic programming Early work genetic programming demonstrated possible evolve sequence workperforming steps single resultproducing branch onepart main program The book Genetic Programming On Programming Computers Means Natural Selection Koza describes extension Hollands genetic algorithm genetic population consists computer programs compositions primitive functions terminals See also Koza Rice In basic form genetic programming single resultproducing branch evolved genetic programming demonstrated capability discover sequence length content workperforming steps sufficient produce satisfactory solution several problems including many problems used years benchmarks machine learning artificial intelligence Before applying genetic programming problem user must perform five major preparatory steps namely identifying terminals inputs tobeevolved programs identifying primitive functions operations contained tobeevolved programs creating fitness measure evaluating well given program solving problem hand choosing certain control parameters notably population size number generations run determining termination criterion method result designation typically bestsofar individual populations produced run creates restricted iterationperforming
Inertia added continuoustime Hopfield effectiveneuron system We explore effects stability fixed points system A two neuron system one two inertial terms added shown exhibit chaos The chaos confirmed Lyapunov exponents power spectra phase space plots
This paper describes partialmemory incremental learning method based AQc inductive learning system The method maintains representative set past training examples used together new examples appropriately modify currently held hypotheses Incremental learning evoked feedback environment user Such method useful applications involving intelligent agents acting changing environment active vision dynamic knowledgebases For study method applied problem computer intrusion detection symbolic profiles learned computer systems users In experiments proposed method yielded significant gains terms learning time memory requirements expense slightly lower predictive accuracy higher concept complexity compared batch learning examples given
The genetic algorithm GA problem solving method modelled process natural selection We interested studying specific aspect GA effect noncoding segments GA performance Noncoding segments segments bits individual provide contribution positive negative fitness individual Previous research noncoding segments suggests including structures GA may improve GA performance Understanding improvement occurs help us use GA full potential In article discuss hypotheses noncoding segments describe results experiments The experiments may separated two categories testing program problems previous related studies testing new hypotheses effect noncoding segments
We previously introduced exemplar model named GCMISW exploits highly flexible weighting scheme Our simulations showed records faster learning rates higher asymptotic accuracies several artificial categorization tasks models limited abilities warp input spaces This paper extends previous work describes experimental results suggest human subjects also invoke highly flexible schemes In particular model provides significantly better fits models less flexibility hypothesize humans selectively weight attributes depending items location input space We need flexible models Many theories human concept learning posit concepts represented prototypes Reed exemplars Medin Schaffer Prototype models represent concepts best example central tendency concept A new item belongs category C relatively similar Cs prototype Prototype models relatively inflexible discard great deal information people use concept learning eg number exemplars concept Homa Cultice variability features Fried Holyoak correlations features Medin et al particular exemplars used Whittlesea concept learning
Current inductive logic programming systems limited handling noise employ greedy covering approach constructing hypothesis one clause time This approach also causes difficulty learning recursive predicates Additionally many current systems implicit expectation cardinality positive negative examples reflect proportion concept instance space A framework learning noisy data fixed example size presented A Bayesian heuristic finding probable hypothesis general framework derived This approach evaluates hypothesis whole rather one clause time The heuristic nice theoretical properties incorporated ILP system Lime Experimental results show Lime handles noise better FOIL PROGOL It able learn recursive definitions noisy data systems perform well Lime also capable learning positive data also negative data
We present general formulation network stochastic directional units This formulation extension Boltzmann machine units binary take values cyclic range radians This measure appropriate many domains representing cyclic angular values eg wind direction days week phases moon The state unit DirectionalUnit Boltzmann Machine DUBM described complex variable phase component specifies direction weights also complex variables We associate quadratic energy function corresponding probability DUBM configuration The conditional distribution units stochastic state circular version Gaussian probability distribution known von Mises distribution In meanfield approximation stochastic dubm phase component units state represents mean direction magnitude component specifies degree certainty associated direction This combination value certainty provides additional representational power unit We present proof settling dynamics meanfield DUBM cause convergence free energy minimum Finally describe learning algorithm simulations demonstrate meanfield DUBMs ability learn interesting mappings fl To appear Neural Networks
For century known damage right hemisphere brain cause patients unaware contralesional side space This condition known unilateral neglect represents collection clinically related spatial disorders characterized failure free vision respond explore orient stimuli predominantly located side space opposite damaged hemisphere Recent studies using simple task line bisection conventional diagnostic test proved surprisingly revealing respect spatial attentional impairments involved neglect In line bisection patient asked mark midpoint thin horizontal line sheet paper Neglect patients generally transect far right center Extensive studies line bisection conducted manipulatingamong factorsline length orientation position We simulated pattern results using existing computational model visual perception selective attention called morsel Mozer morsel already used model data related disorder neglect dyslexia Mozer Behrmann In earlier work morsel lesioned accordance damage suppose occurred brains
This paper overviews proposed architecture adaptive parallel logic referred ASOCS Adaptive SelfOrganizing Concurrent System The ASOCS approach based adaptive network composed many simple computing elements operate parallel asynchronous fashion Problem specification given system presenting ifthen rules form boolean conjunctions Rules added incrementally system adapts changing rulebase Adaptation data processing form two separate phases operation During processing system acts parallel hardware circuit The adaptation process distributed amongst computing elements efficiently exploits parallelism Adaptation done selforganizing fashion takes place time linear depth network This paper summarizes overall ASOCS concept overviews three specific architectures
We describe design tuning controller enforcing compliance prescribed velocity profile railbased transportation system This requires following trajectory rather fixed setpoints automobiles We synthesize fuzzy controller tracking velocity profile providing smooth ride staying within prescribed speed limits We use genetic algorithm tune fuzzy controllers performance adjusting parameters scaling factors membership functions sequential order significance We show approach results controller superior manually designed one modest computational effort This makes possible customize automated tuning variety different configurations route terrain power configuration cargo
The patientadaptive classifier compared wellestablished baseline algorithm six major databases consisting million heartbeats When trained initial records tested additional records patientadaptive algorithm found reduce number Vn errors one channel factor number Nv errors factor We conclude patient adaptation provides significant advance classifying normal vs ventricular beats ECG Patient Monitoring
This paper presents experiment comparing new namepronunciation system Anapron seven existing systems three stateoftheart commercial systems Bellcore Bell Labs DEC two variants machinelearning system NETtalk two humans Anapron works combining rulebased casebased reasoning It based idea much easier improve rulebased system adding casebased reasoning tuning rules deal every exception In experiment described Anapron used set rules adapted MITalk elementary foreignlanguage textbooks case library names With components required relatively little knowledge engineering Anapron found perform almost level commercial systems significantly better two versions NETtalk This work may copied reproduced whole part commercial purpose Permission copy whole part without payment fee granted nonprofit educational research purposes provided whole partial copies include following notice copying permission Mitsubishi Electric Research Laboratories Cambridge Massachusetts acknowledgment authors individual contributions work applicable portions copyright notice Copying reproduction republishing purpose shall require license payment fee Mitsubishi Electric Research Laboratories All rights reserved
This paper devoted problem learning predict ordinal ie ordered discrete classes ILP setting We start relational regression algorithm named SRT Structural Regression Trees study various ways transforming firstorder learner ordinal classification tasks Combinations algorithm variants several data preprocessing methods compared two ILP benchmark data sets verify relative strengths weaknesses strategies study tradeoff optimal categorical classification accuracy hit rate minimum distancebased error Preliminary results indicate promising avenue towards algorithms combine aspects classification regression relational learning
Learning problems text processing domain often map text space whose dimensions measured features text eg words Three characteristic properties domain high dimensionality b learned concepts instances reside sparsely feature space c high variation number active features instance In work study three mistakedriven learning algorithms typical task nature text categorization We argue algorithms categorize documents learning linear separator feature space properties make ideal domain We show quantum leap performance achieved modify algorithms better address specific characteristics domain In particular demonstrate variation document length tolerated either normalizing feature weights using negative weights positive effect applying threshold range training alternatives considering feature frequency benefits discarding features training Overall present algorithm variation Littlestones Winnow performs significantly better algorithm tested task using similar feature set
We show networks relatively realistic mathematical models biological neurons principle simulate arbitrary feedforward sigmoidal neural nets way previously considered This new approach based temporal coding single spikes respectively timing synchronous firing pools neurons rather traditional interpretation analog variables terms firing rates The resulting new simulation substantially faster hence consistent experimental results maximal speed information processing cortical neural systems As consequence show networks noisy spiking neurons universal approximators sense approximate regard temporal coding given continuous function several variables This result holds fairly large class schemes coding analog variables firing times spiking neurons Our new proposal possible organization computations networks spiking neurons systems interesting consequences type learning rules would needed explain selforganization networks Finally fast noiserobust implementation sigmoidal neural nets via temporal coding points possible new ways implementing feedforward recurrent sigmoidal neural nets pulse stream VLSI
In framework functional response model ie regression model feedforward neural network estimator nonlinear response function constructed set functional units The parameters defining functional units estimated using Bayesian approach A sample representing Bayesian posterior distribution obtained applying Markov chain Monte Carlo procedure namely combination Gibbs MetropolisHastings algorithms The method described histogram Bspline radial basis function estimators response function In general proposed approach suitable finding Bayesoptimal values parameters complicated parameter space We illustrate method numerical examples
Selecting set features optimal given classification task one central problems machine learning We address problem using flexible robust filter technique EUBAFES EUBAFES based feature weighting approach computes binary feature weights therefore solution feature selection sense also gives detailed information feature relevance continuous weights Moreover user gets one several potentially optimal feature subsets important filterbased feature selection algorithms since gives flexibility use even complex classifiers application combined filterwrapper approach We applied EUBAFES number artificial real world data sets used radial basis function networks examine impact feature subsets classifier accuracy complexity
A Machine learn biased way Typically bias supplied hand example choice appropriate set features However learning machine embedded within environment related tasks learn bias learning sufficiently many tasks environment In paper two models bias learning equivalently learning learn introduced main theoretical results presented The first model PACtype model based empirical process theory second hierarchical Bayes model
This paper compares efficiency two encoding schemes Artificial Neural Networks optimized evolutionary algorithms Direct Encoding encodes weights priori fixed neural network architecture Cellular Encoding encodes weights architecture neural network In previous studies Direct Encoding Cellular Encoding used create neural networks balancing poles attached cart fixed track The poles balanced controller pushes cart left right In cases velocity information pole cart provided input cases network must learn balance single pole without velocity information A careful study behavior systems suggests possible balance single pole velocity information input without learning compute velocity A new fitness function introduced forces neural network compute velocity By using new fitness function tuning syntactic constraints used cellular encoding achieve tenfold speedup previous study solve difficult problem balancing two poles information velocity provided input
Demands applications requiring massive parallelism symbolic environments given rebirth research models labeled neura l networks These models made many simple nodes highly interconnected computation takes place data flows amongst nodes network To present models proposed nodes based simple analog functions inputs multiplied weights summed total optionally transformed arbitrary function node Learning systems accomplished adjusting weights input lines This paper discusses use digital boolean nodes primitive building block connectionist systems Digital nodes naturally engender new paradigms mechanisms learning processing connectionist networks The digital nodes used basic building block class models called ASOCS Adaptive SelfOrganizing Concurrent Systems These models combine massive parallelism ability adapt selforganizing fashion Basic features standard neural network learning algorithms proposed using digital nodes compared contrasted The latter mechanisms lead vastly improved efficiency many applications
Many abductive understanding systems explain novel situations chaining process neutral explainer needs beyond generating plausible explanation event explained This paper examines relationship standard models abductive understanding casebased explanation model In casebased explanation construction selection abductive hypotheses focused specific explanations prior episodes goalbased criteria reflecting current information needs The casebased method inspired observations human explanation anomalous events everyday understanding paper focuses methods contributions problems building good explanations everyday domains We identify five central issues compare issues addressed traditional casebased explanation models discuss motivations using casebased approach facilitate generation plausible useful explanations domains complex imperfectly un derstood
Previous research shown technique called errorcorrecting output coding ECOC dramatically improve classification accuracy supervised learning algorithms learn classify data points one k classes In paper extend technique ECOC also provide class probability information ECOC method converting kclass supervised learning problem large number L twoclass supervised learning problems combining results L evaluations The underlying twoclass supervised learning algorithms assumed provide L probability estimates The problem computing class probabilities formulated overconstrained system L linear equations Least squares methods applied solve equations Accuracy reliability probability estimates demonstrated
In paper propose reactive critic able respond changing situations We explain usefull reinforcement learning critic used improve control strategy We take problem derive solution analytically This enables us investigate relation parameters resulting approximations critic We also demonstrate reactive critic reponds changing situations
Quadratic Dynamical Systems QDS whose definition extends Markov chains used model phenomena variety fields like statistical physics natural evolution Such systems also play role genetic algorithms widelyused class heuristics notoriously hard analyze Recently Rabinovich et al took important step study QDSs showing technical assumptions systems converge stationary distribution similar theorems Markov Chains wellknown We show however following sampling problem QDSs PSPACEhard Given initial distribution produce random sample tth generation The hardness result continues hold restricted classes QDSs simple initial distributions thus suggesting QDSs intrinsically complicated Markov chains
Hublers technique using aperiodic forces drive nonlinear oscillators resonance analyzed The oscillators examined effective neurons model Hopfield neural networks The method shown valid several different circumstances It verified analysis power spectrum force resonance energy transfer system
An extended version dual constraint model motor endplate morphogenesis presented includes activity dependent independent competition It supported wide range recent neurophysiological evidence indicates strong relationship synaptic efficacy survival The computational model justified molecular level predictions match developmental regenerative behaviour real synapses
We consider problem learning functions fixed distribution An algorithm Kushilevitz Mansour learns boolean function f g n time polynomial L norm Fourier transform function We show KMalgorithm special case general class learning algorithms This achieved extending ideas using representations finite groups We introduce new classes functions learned using generalized KM algorithm
Standard statistical practice ignores model uncertainty Data analysts typically select model class models proceed selected model generated data This approach ignores uncertainty model selection leading overconfident inferences decisions risky one thinks Bayesian model averaging BMA provides coherent mechanism accounting model uncertainty Several methods implementing BMA recently emerged We discuss methods present number examples In examples BMA provides improved outofsample predictive performance We also provide catalogue currently available BMA software
With rapid expansion machine learning methods applications strong need computerbased interactive tools support education area The EMERALD system developed provide handson experience interactive demonstration several machine learning discovery capabilities students AI cognitive science AI professionals The current version EMERALD integrates five programs exhibit different types machine learning discovery learning rules examples determining structural descriptions object classes inventing conceptual clusterings entities predicting sequences objects discovering equations characterizing collections quantitative qualitative data EMERALD extensively uses color graphic capabilities voice synthesis natural language representation knowledge acquired learning programs Each program presented learning robot personality expressed icon voice comments generates learning process results learning presented natural language text andor voice output Users learn capabilities robot challenged perform learning tasks creating similar tasks challenge robot EMERALD extension ILLIAN initial much smaller version toured eight major US Museums Science seen half million visitors EMERALDs architecture allows incorporate new programs new capabilities The system runs SUN workstations available universities educational institutions
The Nozzle Design Associate NDA computational environment design jet engine exhaust nozzles supersonic aircraft NDA may used either design new aircraft design new nozzles adapt existing aircraft may reutilized new missions NDA developed collaboration computer scientists Rutgers University exhaust nozzle designers General Electric Aircraft Engines General Electric Corporate Research Development The NDA project two principal goals provide useful engineering tool exhaust nozzle design explore fundamental research issues arise application automated design optimization methods realistic engineering problems
The learning many visual perceptual tasks motion discrimination shown specific practiced stimulus new stimuli require relearning scratch This specificity found many different tasks supports hypothesis perceptual learning takes place early visual cortical areas In contrast using novel paradigm motion discrimination learning shown specific found generalization We trained subjects discriminate directions moving dots verified learning transfer trained direction new one However tracking subjects performance across time new direction found rate learning doubled Moreover mastering task easy stimulus subjects practiced briefly discriminate easy stimulus new direction generalized difficult stimulus direction This generalization demanded mastering brief practice Thus learning motion discrimination always generalizes new stimuli Learning manifested various forms acceleration learning rate indirect transfer direct transfer These results challenge existing theories perceptual learning suggest complex picture learning takes place multiple levels Learning biological systems great importance But cognitive learning problem solving abrupt generalizes analogous problems appear acquire perceptual skills gradually specifically human subjects generalize perceptual discrimination skill solve similar problems different attributes For example discrimination task described Fig subject trained discriminate motion directions ffi ffi use skill discriminate ffi ffi Such specificity supports hypothesis perceptual learning embodies neuronal modifications brains stimulusspecific cortical areas eg visual area MT In contrast previous results specificity show three experiments learning motion discrimination always generalizes When task easy generalizes directions training
This paper addresses problem learning evolving concepts concepts whose meaning gradually evolves time Solving problem important many applications example building intelligent agents helping users Internet search active vision automatically updating knowledgebases acquiring profiles users telecommunication networks Requirements learning architecture supporting applications include ability incrementally modify concept definitions accommodate new information fast learning recognition rates low memory needs understandability computercreated concept descriptions To address requirements propose learning architecture based VariableValued Logic Star Methodology AQ algorithm The method uses partialmemory approach means step learning system remembers current concept descriptions specially selected representative examples past experience The developed method experimentally applied problem computer system intrusion detection The results show significant advantages method learning speed memory requirements slight decreases predictive accuracy concept simplicity compared traditional batchstyle learning training examples provided
We use simulated evolution search genetic programming automatic synthesis small iterative machinelanguage programs For integer register machine addition instruction sole arithmetic operator show genetic programming produce exact general multiplication routines synthesizing necessary iterative control structures primitive machinelanguage instructions Our program representation virtual register machine admits arbitrary control flow Our evolution strategy furthermore artificially restrict synthesis control structure place upper bound program evaluation time A programs fitness distance output produced test case desired output multiplication The test cases exhaustively cover multiplication finite subset natural numbers N yet derived solutions constitute general multiplication positive integers For problem simulated evolution twopoint crossover operator examines significantly fewer individuals finding solution random search Introduction small rate mutation fur ther increases number solutions
Technical Report CSRP Abstract In paper present GPMusic System interactive system allows users evolve short musical sequences using interactive genetic programming extensions aimed making system fully automated The basic GPsystem works using genetic programming algorithm small set functions creating musical sequences user interface allows user rate individual sequences With user interactive technique possible generate pleasant tunes runs individuals generations As user bottleneck interactive systems system takes rating data users run uses train neural network based automatic rater auto rater replace user bigger runs Using auto rater able make runs generations individuals per generation The best run pieces generated auto raters pleasant general nice generated user interactive runs
Most concept induction algorithms process concept instances described terms properties remain constant time In temporal domains instances best described terms properties whose values vary time Data engineering called upon temporal domains transform raw data appropriate form concept induction I investigate method inducing features suitable classifying finite univariate time series governed unknown deterministic processes contaminated noise In supervised setting I induce piecewise polynomials appropriate complexity characterize data class using Bayesian model induction principles In study I evaluate proposed method empirically semideterministic domain waveform classification problem originally presented CART book I compared classification accuracy proposed algorithm accuracy attained C various noise levels Feature induction improved classification accuracy noisy situations degraded noise The results demonstrate value proposed method presence noise reveal weakness shared classifiers using generative rather discriminative models sensitivity model inaccuracies
In article present casebased approach flexible query answering systems two different application areas The ExperienceBook supports technical diagnosis field system administration In FAllQ project use CBR system document retrieval industrial setting The objective systems manage knowledge stored less structured documents The internal case memory implemented Case Retrieval Net This allows handle large case bases efficient retrieval process In order provide multi user access chose client server model combined web interface
Dynamic programming provides methodology develop planners controllers nonlinear systems However general dynamic programming computationally intractable We developed procedures allow complex planning control problems solved We use second order local trajectory optimization generate locally optimal plans local models value function derivatives We maintain global consistency local models value function guaranteeing locally optimal plans actually globally optimal resolution search procedures
This paper discusses three techniques useful relaxing constraints imposed control flow parallelism control dependence analysis executing multiple flows control simultaneously speculative execution We evaluate techniques using trace simulations find limits parallelism machines employ different combinations techniques We three major results First local regions code limited parallelism control dependence analysis useful extracting global parallelism different parts program Second superscalar processor fundamentally limited execute independent regions code concurrently Higher performance obtained machines multiprocessors dataflow machines simultaneously follow multiple flows control Finally without speculative execution allow instructions execute control dependences resolved modest amounts parallelism obtained programs complex control flow
This paper presents general framework learning searchcontrol heuristics logic programs used improve efficiency accuracy knowledgebased systems expressed definiteclause logic programs The approach combines techniques explanationbased learning recent advances inductive logic programming learn clauseselection heuristics guide program execution Two specific applications framework detailed dynamic optimization Prolog programs improving efficiency natural language acquisition improving accuracy In area program optimization prototype system Dolphin able transform intractable specifications polynomialtime algorithms outperforms competing approaches several benchmark speedup domains A prototype language acquisition system Chill also described It capable automatically acquiring semantic grammars uniformly incorprate syntactic semantic constraints parse sentences caserole representations Initial experiments show approach able construct accurate parsers generalize well novel sentences significantly outperform previous approaches learning caserole mapping based connectionist techniques Planned extensions general framework specific applications well plans evaluation also discussed
We study online generalized linear regression multidimensional outputs ie neural networks multiple output nodes hidden nodes We allow final layer transfer functions softmax function need consider linear activations output neurons We use distance functions certain kind two completely independent roles deriving analyzing online learning algorithms tasks We use one distance function define matching loss function possibly multidimensional transfer function allows us generalize earlier results onedimensional multidimensional outputs We use another distance function tool measuring progress made online updates This shows previously studied algorithms gradient descent exponentiated gradient fit common framework We evaluate performance algorithms using relative loss bounds compare loss online algoritm best offline predictor relevant model class thus completely eliminating probabilistic assumptions data
Systems automated design optimization complex realworld objects principle constructed combining domainindependent numerical routines existing domainspecific analysis simulation programs Unfortunately legacy analysis codes frequently unsuitable use automated design They may crash large classes input numerically unstable locally nonsmooth highly sensitive control parameters To useful analysis programs must modified reduce eliminate undesired behaviors without altering desired computation To direct modification programs laborintensive necessitates costly revalidation We implemented highlevel language runtime environment allow failurehandling strategies incorporated existing Fortran C analysis programs preserving computational integrity Our approach relies globally managing execution programs level discretely callable functions computation affected problems detected Problem handling procedures constructed knowledge base generic problem management strategies We show approach effective improving analysis program robustness design optimization performance domain conceptual design jet engine nozzles
In paper study sample complexity weak learning That ask much data must collected unknown distribution order extract small significant advantage prediction We show important distinguish learning algorithms output deterministic hypotheses output randomized hypotheses We prove weak learning model algorithm using deterministic hypotheses weakly learn class VapnikChervonenkis dimension dn requires dn examples In contrast randomized hypotheses allowed show fi examples suffice cases We show exists efficient algorithm using deterministic hypotheses weakly learns distribution set size dn Odn examples Thus class symmetric Boolean functions n variables strong learning sample complexity fin sample complexity weak learning using deterministic hypotheses n On sample complexity weak learning using randomized hypotheses fi Next prove existence classes distributionfree sample size required obtain slight advantage prediction random guessing essentially equal required obtain arbitrary accuracy Finally class small circuits namely parity functions subsets n Boolean variables prove weak learning sample complexity fin This bound holds even weak learning algorithm allowed replace random sampling membership queries target distribution uniform f g n p
Convergence Results EM Approach Abstract The ExpectationMaximization EM algorithm iterative approach maximum likelihood parameter estimation Jordan Jacobs recently proposed EM algorithm mixture experts architecture Jacobs Jordan Nowlan Hinton hierarchical mixture experts architecture Jordan Jacobs They showed empirically EM algorithm architectures yields significantly faster convergence gradient ascent In current paper provide theoretical analysis algorithm We show algorithm regarded variable metric algorithm searching direction positive projection gradient log likelihood We also analyze convergence algorithm provide explicit expression convergence rate In addition describe acceleration technique yields significant speedup simulation experiments This report describes research done Dept Brain Cognitive Sciences Center Biological Computational Learning Artificial Intelligence Laboratory Massachusetts Institute Technology Support CBCL provided part grant NSF ASC Support laboratorys artificial intelligence research provided part Advanced Research Projects Agency Dept Defense The authors supported grant McDonnellPew Foundation grant ATR Human Information Processing Research Laboratories grant Siemens Corporation grant IRI National Science Foundation grant NJ Office Naval Research NSF grant ECS support Initiative Intelligent Control MIT Michael I Jordan NSF Presidential Young Investigator
An agent must learn act world trial error faces reinforcement learning problem quite different standard concept learning Although good algorithms exist problem general case often quite inefficient exhibit generalization One strategy find restricted classes action policies learned efficiently This paper pursues strategy developing algorithm performans online search space action mappings expressed Boolean formulae The algorithm compared existing methods empirical trials shown good performance
The aim paper describe ADAPtER system diagnostic architecture combining casebased reasoning abductive reasoning exploiting adaptation solution old episodes order focus reasoning process Domain knowledge represented via logical model basic mechanisms based abductive reasoning consistency constraints defined solving complex diagnostic problems involving multiple faults The modelbased component supplemented case memory adaptation mechanisms developed order make diagnostic system able exploit past experience solving new cases A heuristic function proposed able rank solutions associated retrieved cases respect adaptation effort needed transform solutions possible solutions current case We discuss preliminary experiments showing validity heuristic convenience solving new case adapting retrieved solution rather solving new problem scratch
The commonly used neural network models well suited direct digital implementations node needs perform large number operations floating point values Fortunately ability learn examples generalize restricted networks type Indeed networks node implements simple Boolean function Boolean networks designed way exhibit similar properties Two algorithms generate Boolean networks examples presented The results show algorithms generalize well class problems accept compact Boolean network descriptions The techniques described general applied tasks known characteristic Two examples applications presented image reconstruction handwritten character recognition
This paper explores issues involved implementing robot learning challenging dynamic task using case study robot juggling We use memorybased local model ing approach locally weighted regression represent learned model task performed Statistical tests given examine uncertainty model optimize pre diction quality deal noisy corrupted data We develop exploration algorithm explicitly deals prediction accuracy requirements dur ing explo ration Using ingredients combination methods optimal control robot achieves fast real time learning task within trials Address authors Massachusetts Institute Technology The Artificial Intelligence Laboratory The Department Brain Cognitive Sciences Technology Square Cambride MA USA Email sschaalaimitedu cgaaimitedu Support provided Air Force Office Sci entific Research Siemens Cor pora tion Support first author provided Ger man Scholar ship Foundation Alexander von Hum boldt Founda tion Support second author provided Na tional Sci ence Foundation Pre sidential Young Investigator Award We thank Gideon Stein im ple ment ing first version LWR microprocessor Gerrie van Zyl build ing devil stick robot implementing first version devil stick learning
Genetic algorithms extensively used different domains means global optimization simple yet reliable manner However realistic engineering design optimization domains observed simple classical implementation GA based binary encoding bit mutation crossover sometimes inefficient unable reach global optimum Using floating point representation alone eliminate problem In paper describe way augmenting GA new operators strategies take advantage structure properties engineering design domains Empirical results initially domain conceptual design supersonic transport aircraft domain high performance supersonic missile inlet design demonstrate newly formulated GA significantly better classical GA terms efficiency reliability httpwwwcsrutgersedushehatapapershtml
Consider estimating mean vector data N n I l q norm loss q known lie ndimensional l p ball p For large n ratio minimax linear risk minimax risk arbitrarily large p lt q Obvious exceptions aside limiting ratio equals p q Our arguments mostly indirect involving reduction univariate Bayes minimax problem When p lt q simple nonlinear coordinatewise threshold rules asymptotically minimax small signaltonoise ratios within bounded factor asymptotic minimaxity general Our results basic theory estimation Besov spaces
Searching objects scenes natural task people extensively studied psychologists In paper examine task connectionist perspective Computational complexity arguments suggest parallel feedforward networks perform task efficiently One difficulty order distinguish target distractors combination features must associated single object Often called binding problem requirement presents serious hurdle connectionist models visual processing multiple objects present Psychophysical experiments suggest people use covert visual attention get around problem In paper describe psychologically plausible system uses focus attention mechanism locate target objects A strategy combines topdown bottomup information used minimize search time The behavior resulting system matches reaction time behavior people several interesting tasks
We present algorithm inducing recursive clauses using inverse implication rather inverse resolution underlying generalization method Our approach applies class logic programs similar class primitive recursive functions Induction performed using small number positive examples need along resolution path Our algorithm implemented system named CRUSTACEAN locates matched lists generating terms determine pattern decomposition exhibited target recursive clause Our theoretical analysis defines class logic programs approach complete described terms characteristic ILP approaches Our current implementation considerably faster previously reported We present evidence demonstrating given randomly selected inputs increasing number positive examples increases accuracy reduces number outputs We relate approach similar recent work inducing recursive clauses
The pursuerevader PE game recognized important domain study coevolution robust adaptive behavior protean behavior Miller Cliff Nevertheless potential game largely unrealized due methodological hurdles coevolutionary simulation raised PE versions game optimal solutions Isaacs closedended formulations opaque respect solution space lack rigorous metric agent behavior This inability characterize behavior turn obfuscates coevolutionary dynamics We present new formulation PE affords rigorous measure agent behavior system dynamics The game moved twodimensional plane onedimensional bitstring time step evader generates bit pursuer must simultaneously predict Because behavior expressed time series employ information theory provide quantitative analysis agent activity Further version PE opens vistas onto communicative component pursuit evasion behavior providing openended serial communications channel open world via coevolution Results show subtle changes game determine whether openended profoundly affect viability armsrace dynamics
The aim paper understand involved parametric design problem solving In order achieve goal paper identify detail conceptual elements defining parametric design task specification ii illustrate elements interpreted operationalised design process iii formulate generic model parametric design problem solving We redescribe number problem solving methods terms proposed generic model show redescription enables us provide precise account different competence behaviours expressed methods Design constructing artifacts This means broadly speaking design process creative sense design process produces new solution opposed selecting solution predefined set While recognizing essential creative elements present design process researchers eg Gero restrict use term creative design design applications design elements target artifact selected predefined set For instance designing new car model normally case design innovations included present previous car designs In words always possible characterise process designing new car one components assembled configured predefined set Nevertheless large number realworld applications possible assume target artifact going designed terms predefined design elements In scenario design process consists assembling configuring preexisting design elements way satisfies design requirements constraints approximates typically costrelated optimization criterion This class design tasks takes name configuration design Stefik In many cases typically problem hand exhibit complex spatial requirements possible solutions adhere common solution template possible simplify configuration design problem even modelling target artifact set parameters characterizing design problem solving process assigning values parameters accordance given design requirements constraints optimization criterion When assumption true particular task say parametric design task The VT application Marcus et al Yost Rothenfluh provides wellknown example parametric design task The aim paper understand involved parametric design problem solving In order achieve goal paper identify detail conceptual elements defining parametric design task specification ii illustrate elements interpreted operationalised design process iii produce generic model parametric design problem solving characterised knowledge level generalizes existing methods parametric design We redescribe number problem solving methods terms question
In paper describe selfadjusting algorithm packet routing reinforcement learning method embedded node network Only local information used node keep accurate statistics routing policies lead minimal routing times In simple experiments involving node irregularlyconnected network learning approach proves superior routing based precomputed shortest paths
Modern compilers employ sophisticated instruction scheduling techniques shorten number cycles taken execute instruction stream In addition correctness instruction scheduler must also ensure hardware resources oversubscribed cycle For contemporary processor implementation multiple pipelines complex resource usage restrictions easy task The complexity involved reasoning resource hazards one primary factors constrain instruction scheduler performing many aggressive transformations For example ability code motion instruction replacement middle already scheduled block would powerful transformation could performed efficiently We extend technique detecting pipeline resource hazards based finite state automata support efficient implementation transformations essential aggressive instruction scheduling beyond basic blocks Although similar code transformations supported schemes reservation tables scheme superior terms space time A global instruction scheduler used techniques implemented KSR compiler
We propose new method variable selection estimation Coxs proportional hazards model Our proposal minimizes log partial likelihood subject sum absolute values parameters bounded constant Because nature constraint tends produce coefficients exactly zero hence gives interpretable models The method variation lasso proposal Tibshirani designed linear regression context Simulations indicate lasso accurate stepwise selection setting
GMD Report Abstract Many current artificial neural network systems serious limitations concerning accessibility flexibility scaling reliability In order go way removing suggest reflective neural network architecture In architecture modular structure important element The buildingblock elements called minos modules They perform selfobservation inform current level development scope expertise within module A Pandemonium system integrates submodules work together handle mapping tasks Network complexity limitations attacked way Pandemonium problem decomposition paradigm static dynamic unreliability whole Pandemonium system effectively eliminated generation interpretation confidence ambiguity measures every moment development system Two problem domains used test demonstrate various aspects architecture Reliability quality measures defined systems answer part time Our system achieves better quality values single networks larger size handwritten digit problem When second third best answers accepted system left error test set better best single net It also shown system elegantly learn handle garbage patterns With parity problem demonstrated complexity problems may decomposed automatically system solving networks size smaller single net required Even system find solution parity problem networks small size used reliability remains around Our Pandemonium architecture gives power flexibility higher levels large hybrid system single net system offering useful information higherlevel feedback loops reliability answers may intelligently traded less reliable important intuitional answers In providing weighted alternatives possible generalizations architecture gives best possible service larger system form part
The demands rapid response complexity many environments make difficult decompose tune coordinate reactive behaviors ensuring consistency We hypothesize complex behaviors decomposed separate behaviors resident separate networks coordinated higher level controller To explore issues implemented neural network architecture reactive component two layer control system simulated race car By varying architecture tested whether decomposing reactivity separate behaviors leads superior overall performance learning convergence Based results modified architecture produce race car competitive publicly available solutions
Supervised classification problems received considerable attention machine learning community We propose novel genetic algorithm based prototype learning system PLEASE class problems Given set prototypes possible classes class input instance determined prototype nearest instance We assume ordinal attributes prototypes represented sets featurevalue pairs A genetic algorithm used evolve number prototypes per class positions input space determined corresponding featurevalue pairs Comparisons C set artificial problems controlled complexity demonstrate effectiveness pro posed system
This paper compares two methods refining uncertain knowledge bases using propositional certaintyfactor rules The first method implemented Rapture system employs neuralnetwork training refine certainties existing rules uses symbolic technique add new rules The second method based one used Kbann system initially adds complete set potential new rules low certainty allows neuralnetwork training filter adjust rules Experimental results indicate former method results significantly faster training produces much simpler refined rule bases slightly greater accuracy
This paper discusses approach constructing new attributes based decision trees production rules It improve concepts learned form decision trees simplifying improving predictive accuracy In addition approach distinguish relevant primitive attributes irrelevant primitive attributes
Performance human subjects wide variety early visual processing tasks improves practice HyperBF networks Poggio Girosi constitute mathematically wellfounded framework understanding improvement performance perceptual learning class tasks known visual hyperacuity The present article concentrates two issues raised recent psychophysical computational findings reported Poggio et al b Fahle Edelman First develop biologically plausible extension HyperBF model takes account basic features functional architecture early vision Second explore various learning modes coexist within HyperBF framework focus two unsupervised learning rules may involved hyperacuity learning Finally report results psychophysical experiments consistent hypothesis activitydependent presynaptic amplification may involved perceptual learning hyperacuity
Proceedings nd International Conference Knowledge Discovery Data Mining KDD The official version paper published American Association Artificial Intelligence httpwwwaaaiorg c fl American Association Artificial Intelligence All rights reserved Abstract We describe results performing data mining challenging medical diagnosis domain acute abdominal pain This domain well known difficult yielding little predictive accuracy human machine diagnosticians Moreover many researchers argue one simplest approaches naive Bayesian classifier optimal By comparing performance naive Bayesian classifier general cousin Bayesian network classifier selective Bayesian classifiers total attributes show simplest models perform least well complex models We argue simple models like selective naive Bayesian classifier perform well complicated models similarly complex domains relatively small data sets thereby calling question extra expense necessary induce complex models
This report describes statistical research development work hospital quality monitor data sets nationwide VA hospital system The project covers statistical analysis exploration modelling data several quality monitors primary goals understanding patterns variability time hospitallevel monitor area specific quality monitor measures b understanding patterns dependencies sets monitors We present discussion basic perspectives data structure preliminary data exploration three monitors followed developments several classes formal models We identify classes hierarchical random effects time series models relevance modelling single multiple monitor time series We summarise basic model features results analyses three monitor data sets single multiple monitor frameworks present variety summary inferences graphical displays Our discussion includes summary conclusions related two key goals discussions questions comparisons across hospitals recommendations potential substantive statistical investigations
Adaptive ridge special form ridge regression balancing quadratic penalization parameter model This paper shows equivalence adaptive ridge lasso least absolute shrinkage selection operator This equivalence states procedures produce estimate Least absolute shrinkage thus viewed particular quadratic penalization From observation derive EM algorithm compute lasso solution We finally present series applications type algorithm regres sion problems kernel regression additive modeling neural net training
Technical Report NCRG available httpwwwncrgastonacuk To appear Advances Neural Information Processing Systems eds M I Jordan M J Kearns S A Solla Lawrence Erlbaum Abstract Gaussian processes provide natural nonparametric prior distributions regression functions In paper consider regression problems noise output variance noise depends inputs If assume noise smooth function inputs natural model noise variance using second Gaussian process addition Gaussian process governing noisefree output value We show prior uncertainty parameters controlling processes handled posterior distribution noise rate sampled using Markov chain Monte Carlo methods Our results synthetic data set give posterior noise variance wellapproximates true variance
Technical Report No Department Statistics University Toronto Abstract Simulated annealing moving tractable distribution distribution interest via sequence intermediate distributions traditionally used inexact method handling isolated modes Markov chain samplers Here shown one use Markov chain transitions annealing sequence define importance sampler The Markov chain aspect allows method perform acceptably even highdimensional problems finding good importance sampling distributions would otherwise difficult use importance weights ensures estimates found converge correct values number annealing runs increases This annealed importance sampling procedure resembles second half previouslystudied tempered transitions seen generalization recentlyproposed variant sequential importance sampling It also related thermodynamic integration methods estimating ratios normalizing constants Annealed importance sampling attractive isolated modes present estimates normalizing constants required may also generally useful since independent sampling allows one bypass problems assessing convergence autocorrelation Markov chain samplers
Inversion multilayer synchronous networks method tries answer questions like What kind input give desired output Is possible get desired output special inputoutput constraints We describe two methods inverting connectionist network Firstly extend inversion via backpropagation LindenKindermann Williams recurrent Elman Jordan Mozer WilliamsZipser timedelayed Waibel al discrete versions continuous networks Pineda Pearlmutter The result inversion input vector The corresponding output vector equal target vector except small remainder The knowledge attractors may help understand function generalization qualities connectionist systems kind Secondly introduce new inversion method proving nonexistence input combination special constraints eg subspace input space This method works iterative exclusion invalid activation values It might helpful way judge properties trained network We conclude simulation results three different tasks XOR morse signal decoding handwritten digit recognition
In paper study learning algorithms environments changing time Unlike previous work interested case changes might rapid direction relatively constant We model type change assuming target distribution changing continuously constant rate one extreme distribution another We show case use simple weighting scheme estimate error hypothesis using estimate minimize error prediction
Chaque parametre du modele est penalise individuellement Le reglage de ces penalisations se fait automatiquement partir de la definition dun hyperparametre de regularisation globale Cet hyperparametre qui controle la complexite du regresseur peut etre estime par des techniques de reechantillonnage Nous montrons experimentalement les performances et la stabilite de la penalisation multiple adaptative dans le cadre de la regression lineaire Nous avons choisi des problemes pour lesquels le probleme du controle de la complexite est particulierement crucial comme dans le cadre plus general de lestimation fonctionnelle Les comparaisons avec les moindres carres regularises et la selection de variables nous permettent de deduire les conditions dapplication de chaque algorithme de penalisation Lors des simulations nous testons egalement plusieurs techniques de reechantillonnage Ces techniques sont utilisees pour selectionner la complexite optimale des estimateurs de la fonction de regression Nous comparons les pertes occasionnees par chacune dentre elles lors de la selection de modeles sousoptimaux Nous regardons egalement si elles permettent de determiner lestimateur de la fonction de regression minimisant lerreur en generalisation parmi les differentes methodes de penalisation en competition
The Crossover operator common implementations Genetic Programming GP Another usually unavoidable factor form restriction size trees GP population This paper concentrates interaction Crossover operator restriction tree depth demonstrated MAX problem involves returning largest possible value given function terminal sets
We propose model efficient online reinforcement learning based expected mistake bound framework introduced Haussler Littlestone Warmuth The measure performance use expected difference total reward received learning agent received agent behaving optimally start We call expected difference cumulative mistake agent require levels reasonably fast rate learning progresses We show model polynomially equivalent PAC model offline reinforcement learning introduced Fiechter In particular show offline PAC reinforcement learning algorithm transformed efficient online algorithm simple practical way An immediate consequence result PAC algorithm general finite statespace reinforcement learning problem described Fiechter transformed polynomial online al gorithm guaranteed performances
We investigate space protein sequences We combine standard measures similarity SW FASTA BLAST associate sequence exhaustive list neighboring sequences These lists induce weighted directed graph whose vertices sequences The weight edge connecting two sequences represents degree similarity This graph encodes much fundamental properties sequence space We look clusters related proteins graph These clusters correspond strongly connected sets vertices Two main ideas underlie work Interesting homologies among proteins deduced transitivity ii Transitivity applied restrictively order prevent unrelated proteins clustering together Our analysis starts conservative classification based significant similarities many classes Subsequently classes merged include less significant similarities Merging performed via novel two phase algorithm First algorithm identifies groups possibly related clusters based transitivity strong connectivity using local considerations merges Then global test applied identify nuclei strong relationships within groups clusters classification refined accordingly This process takes place varying thresholds statistical significance step algorithm applied classes previous classification obtain next one permissive threshold Consequently hierarchical organization proteins obtained The resulting classification splits space protein sequences well defined groups proteins The results show automatically induced sets proteins closely correlated natural biological families super families The hierarchical organization reveals finer subfamilies make known families proteins well many interesting relations protein families The hierarchical organization proposed may considered first map space protein sequences An interactive web site including results analysis constructed accessible httpwwwprotomapcshujiacil
This paper presents system WHY learns updates diagnostic knowledge base using domain knowledge set examples The apriori knowledge consists causal model domain stating relationships among basic phenomena body phenomenological theory describing links abstract concepts possible manifestations world The phenomenological knowledge used deductively causal model used abductively examples used inductively The problems imperfection intractability theory handled allowing system make assumptions reasoning In way robust knowledge learned limited complexity limited number examples The system works first order logic environment applied real domain
This paper investigates behaviour random walk Metropolis algorithm high dimensional problems Here concentrate case components target density spatially homogeneous Gibbs distribution finite range The performance algorithm strongly linked presence absence phase transition Gibbs distribution convergence time approximately linear dimension problems phase transition present Related optimal way scale variance proposal distribution order maximise speed convergence algorithm This turns involve scaling variance proposal reciprocal dimension least phase transition free case Moreover actual optimal scaling characterised terms overall acceptance rate algorithm maximising value value predicted studies simpler classes target density The results proved framework weak convergence result shows algorithm actually behaves like infinite dimensional diffusion process high dimensions Introduction discussion results
This paper develops probabilistic bounds outofsample error rates several classifiers using single set insample data The bounds based probabilities partitions union insample outofsample data insample outofsample data sets The bounds apply insample outofsample data drawn distribution Partitionbased bounds stronger VCtype bounds require computation
We present framework learning DFA simple examples We show efficient PAC learning DFA possible class distributions restricted simple distributions teacher might choose examples based knowledge target concept This answers open research question posed Pitts seminal paper Are DFAs PACidentifiable examples drawn uniform distribution known simple distribution Our approach uses RPNI algorithm learning DFA labeled examples In particular describe efficient learning algorithm exact learning target DFA high probability bound number states N target DFA known advance When N known show algorithm used efficient PAC learning DFAs
The Simple Synchrony Network SSN new connectionist architecture incorporating insights Temporal Synchrony Variable Binding TSVB Simple Recurrent Networks The use TSVB means SSNs output representations structures learn generalisations constituents structures required systematicity This paper describes SSN associated training algorithm demonstrates SSNs generalisation abilities results training
This paper deals combination Evolutionary Algorithms Artificial Neural Networks ANN A new method presented find good buildingblocks architectures Artificial Neural Networks The method based Cellular Encoding representation scheme F Gruau Genetic Programming J Koza First shown modified Cellular Encoding technique able find good architectures even nonboolean networks With help graphdatabase new graphrewriting method secondly possible build architectures modular structures The information buildingblocks architectures obtained statistically analyzing data graphdatabase Simulation results two real world problems given
We attempted obtain stronger correlation relationship G G performance This included studying variance fitnesses members population well observing rate convergence GP respect G population evolved G Unfortunately yet able obtain significant correlation In future work plan track genetic diversity considered phenotypic variance far populations order shed light underlying mechanism priming One factor made analysis difficult far use genetic programming space genotypes large ie many redundant solutions neighborhood structure less easily intuited standard genetic algorithm Since every reason believe underlying mechanism incremental evolution largely independent peculiarities genetic programming currently investigating incremental evolution mechanism using genetic algorithms fixedlength genotypes This enable better understanding mechanism Ultimately scale research effort analyze incremental evolution one transition test cases This involve many open issues regarding optimization transition schedule test cases We performed following experiment Let F itI G fitness value genetic program I according evaluation function G Best OfP op G member I fl population P op time highest fitness according G words I fl Best Of P op G maximizes F itI G I P op A population P op evolved usual manner using evaluation function G generations However generation also evaluated current population using evaluation function G recorded value F itBest Of P op G G In words evolved population using G evaluation function every generation also computed fitness best individual population according G saved value Using random seed control parameters evolved population P op generations using G evaluation function note generation P op identical P op For values compared F itBest Of P op G G F itBest Of P op G G order better formalize exploit notion domain difficulty
Genetic Programming computationally expensive For applications vast majority time spent evaluating candidate solutions desirable make individual evaluation efficient possible We describe genome compiler compiles sexpressions machine code resulting significant speedup individual evaluations standard GP systems Based performance results symbolic regression show execution genome compiler system comparable fastest alternative GP systems We also demonstrate utility compilation realworld problem lossless image compression A somewhat surprising result test domains overhead compilation negligible
The Crossover operator common implementations Genetic Programming GP Another usually unavoidable factor form restriction size trees GP population This paper concentrates interaction Crossover operator restriction tree depth demonstrated MAX problem involves returning largest possible value given function terminal sets Some characteristics inadequacies Crossover normal use highlighted discussed Subtree discovery movement takes place mostly near leaf nodes nodes near root left untouched Diversity drops quickly zero near root node tree population GP unable create fitter trees via crossover operator leaving Mutation operator common ineffective route discovery fitter trees
Design rationale record design activity alternatives available choices made reasons explanations proposed design intended work We describe representation called Functional Representation FR used represent devices functions arise causally functions components interconnections We propose FR provide basis capturing causal aspects design rationale We briefly discuss use FR number tasks would expect design rationale useful generation diagnostic knowledge design verification redesign
We present neural networkbased face detection system A retinally connected neural network examines small windows image decides whether window contains face The system arbitrates multiple networks improve performance single network We use bootstrap algorithm training networks adds false detections training set training progresses This eliminates difficult task manually selecting nonface training examples must chosen span entire space nonface images Comparisons stateoftheart face detection systems presented system better performance terms detection falsepositive rates This work partially supported grant Siemens Corporate Research Inc Department Army Army Research Office grant number DAAHG Office Naval Research grant number N This work started Shumeet Baluja supported National Science Foundation Graduate Fellowship He currently supported graduate student fellowship National Aeronautics Space Administration administered Lyndon B Johnson Space Center The views conclusions contained document authors interpreted necessarily representing official policies endorsements either expressed implied sponsoring agencies
