PhD Thesis
Announcements of the last step towards the PhD
Evolutionary Algorithms and de Novo Peptide Design
PhD Candidate: Ignasi Belda
Advisors: Drs. Ernest Giralt and Fransesc Xavier LlorÃ
Tutor: Dra. Angela Nebot
Summary: The present thesis addresses the specific biomedical problem of the automated design of peptide ligands that bind therapeutic protein targets. To achieve this, we use evolutionary algorithms that evolve peptide populations. Evolutionary algorithms start the search with random peptides−individuals−, and then, by applying evolutionary rules−survival of fitness, genotypic inheritance, etc. −explore the space in an implicitly parallel manner. The fitness function that determines the fitness of each individual−in other words, the function to be optimized−is the free energy of binding between the peptide ligand being evaluated and the target protein. This energy is obtained through peptide-protein docking simulations.
In this thesis I study several implementations of evolutionary algorithms and some extensions that amplify evolutionary computational capacities, such as parallel evolutionary algorithms, multimodal evolutionary algorithms, fitness inheritance techniques, and variable length individuals evolution. Finally, the methodology−ENPDA (Evolutionary de Novo Peptide Design Algorithm)−is applied to the design of peptides that can recognize biomedical important targets, such as, the proteins p53, prolyl oligopeptidase, DNA gyrase, and MHC H-2Kb, as well as a model of amyloid-ß (1-42) fibril.
Among the developed and tested extensions of the evolutionary algorithms, there is the two-leveled parallelization performed on the evolutionary algorithms, with an almost linear scalability; the multimodal evolutionary algorithms developed, which lead the evolutionary process towards a molecular diverse search; the fitness inheritance techniques, that, theoretically, are expected to speed up in a great manner the evolutionary process, but it does not happen in ENPDA, due to the hypothesis explained later on; and the variable length individuals evolution, which prepares ENPDA to dynamically adapt peptide size to each protein surface patch.
I also develop a data mining technology which automatically extracts new knowledge from biomedical databases. The methodology is applied and validated in two different data sets: one comprising a group of peptide ligands, and the other, comprising AstraZeneca's hERG toxicology database. In this knowledge extraction process I also use evolutionary algorithms to evolve a set of rules that describe and generalize hidden patterns of the databases. I do this by applying a set of computational operations which detect and filter significant conditions. Finally, this set of significant conditions is interpreted, and the new knowledge is automatically generated thereof.
In this thesis I study several implementations of evolutionary algorithms and some extensions that amplify evolutionary computational capacities, such as parallel evolutionary algorithms, multimodal evolutionary algorithms, fitness inheritance techniques, and variable length individuals evolution. Finally, the methodology−ENPDA (Evolutionary de Novo Peptide Design Algorithm)−is applied to the design of peptides that can recognize biomedical important targets, such as, the proteins p53, prolyl oligopeptidase, DNA gyrase, and MHC H-2Kb, as well as a model of amyloid-ß (1-42) fibril.
Among the developed and tested extensions of the evolutionary algorithms, there is the two-leveled parallelization performed on the evolutionary algorithms, with an almost linear scalability; the multimodal evolutionary algorithms developed, which lead the evolutionary process towards a molecular diverse search; the fitness inheritance techniques, that, theoretically, are expected to speed up in a great manner the evolutionary process, but it does not happen in ENPDA, due to the hypothesis explained later on; and the variable length individuals evolution, which prepares ENPDA to dynamically adapt peptide size to each protein surface patch.
I also develop a data mining technology which automatically extracts new knowledge from biomedical databases. The methodology is applied and validated in two different data sets: one comprising a group of peptide ligands, and the other, comprising AstraZeneca's hERG toxicology database. In this knowledge extraction process I also use evolutionary algorithms to evolve a set of rules that describe and generalize hidden patterns of the databases. I do this by applying a set of computational operations which detect and filter significant conditions. Finally, this set of significant conditions is interpreted, and the new knowledge is automatically generated thereof.
Date: 10th of March
Time: 11h
Place: Aula Fèlix Serratosa
Parc CientÃfic de Barcelona
c/ Josep Samitier, 1-5
080028 Barcelona.
Systematic Construction of Goal-Oriented COTS Taxonomies
PhD Candidate: Claudia Patricia Ayala MartÃnez
Advisor: Dr. Xavier Franch
Summary: The process of building software systems by assembling and integrating pre-packaged solutions in the form of Commercial-Off-The-Shelf (COTS) software components has become a strategic need in a wide variety of application areas. In general, COTS components are software components that provide a specific functionality, available in the market to be purchased, interfaced and integrated into other software systems. The potential benefits of this technology are mainly its reduced costs and shorter development time, while maintaining the quality. Nevertheless, many challenges ranging form technical to legal issues must be faced for adapting the traditional software engineering activities in order to exploit these benefits.
Nowadays there is an increasingly huge marketplace of COTS components; therefore, one of the most critical activities in COTS-based development is the selection of the components to be integrated into the system under development. Selection is basically composed of two main processes, namely: searching of candidates from the marketplace and their evaluation with respect to the system requirements. Unfortunately, most of the different existing methods for COTS selection focus their efforts on evaluation, letting aside the problem of searching components in the marketplace. Searching candidate COTS is not an easy task, having to cope with some challenging marketplace characteristics related to its widespread, evolvable and growing nature; and the lack of available and well-suited information to obtain a quality-assured search. Indeed, traditional reuse approaches also lack of appropriate solutions to reuse COTS components and the knowledge gained in each selection process. This lack of proposals is a serious drawback that makes the whole selection process highly risky, and often expensive and inefficient.
This dissertation introduces the GOThIC (Goal-Oriented Taxonomy and reuse Infrastructure Construction) method aimed at building a domain reuse infrastructure for facilitating COTS components searching and reuse. It is based on goal-oriented approaches for building abstract, well-founded and stable taxonomies capable of dealing with the COTS marketplace characteristics. Thus, the nodes of these taxonomies are characterized by means of goals, their relationships declared as dependencies among them and several artefacts are constructed and managed for reusability and evolution purposes.
The GOThIC method has been elaborated following an iterative process based on action-research premises to identify the actual challenges related to COTS components searching. Then, possible solutions were envisaged and implemented by several industrial and academic case studies in different domains. Successful results were recorded to articulate the synergic GOThIC method solution, followed by its preliminary industrial evaluation in some Norwegian companies.
Nowadays there is an increasingly huge marketplace of COTS components; therefore, one of the most critical activities in COTS-based development is the selection of the components to be integrated into the system under development. Selection is basically composed of two main processes, namely: searching of candidates from the marketplace and their evaluation with respect to the system requirements. Unfortunately, most of the different existing methods for COTS selection focus their efforts on evaluation, letting aside the problem of searching components in the marketplace. Searching candidate COTS is not an easy task, having to cope with some challenging marketplace characteristics related to its widespread, evolvable and growing nature; and the lack of available and well-suited information to obtain a quality-assured search. Indeed, traditional reuse approaches also lack of appropriate solutions to reuse COTS components and the knowledge gained in each selection process. This lack of proposals is a serious drawback that makes the whole selection process highly risky, and often expensive and inefficient.
This dissertation introduces the GOThIC (Goal-Oriented Taxonomy and reuse Infrastructure Construction) method aimed at building a domain reuse infrastructure for facilitating COTS components searching and reuse. It is based on goal-oriented approaches for building abstract, well-founded and stable taxonomies capable of dealing with the COTS marketplace characteristics. Thus, the nodes of these taxonomies are characterized by means of goals, their relationships declared as dependencies among them and several artefacts are constructed and managed for reusability and evolution purposes.
The GOThIC method has been elaborated following an iterative process based on action-research premises to identify the actual challenges related to COTS components searching. Then, possible solutions were envisaged and implemented by several industrial and academic case studies in different domains. Successful results were recorded to articulate the synergic GOThIC method solution, followed by its preliminary industrial evaluation in some Norwegian companies.
Date: 31st of March
Time: 12:00h
Place: Sala D'actes de la Facultat d'Informà tica de Barcelona.
Edifici B6. Campus Nord.
A Flexible Multitask Summarizer for Documents from Different Media, Domain and Language
PhD Candidate: MarÃa Fuentes Fort
Advisor: Dr. Horacio RodrÃguez Hontoria
Summary: Automatic Summarization is probably crucial with the increase of document generation. Particularly when retrieving, managing and processing information have become decisive tasks. However, one should not expect `perfect systems able to substitute human summaries. The automatic summarization process strongly depends not only on the characteristics of the documents, but also on user different needs. Thus, several aspects have to be taken into account when designing an information system for summarizing, because, depending on the characteristics of the input documents and the desired results, several techniques can be applied. In order to support this process, the final goal of the thesis is to provide a flexible multitask summarizer architecture. This goal is decomposed in three main research purposes.
First, to study the process of porting systems to different summarization tasks, processing documents in different languages, domains or media with the aim of designing a generic architecture to permit the easy addition of new tasks by reusing existent tools.
Second, to develop prototypes for some tasks involving aspects related with the language, the media and the domain of the document or documents to be summarized as well as aspects related with the summary content: generic, novelty, summaries, or summaries that give answer to a specific user need.
Third, to create an evaluation framework to analyze the performance of several approaches in written news and scientific oral presentation domains, focusing mainly in its intrinsic evaluation.
First, to study the process of porting systems to different summarization tasks, processing documents in different languages, domains or media with the aim of designing a generic architecture to permit the easy addition of new tasks by reusing existent tools.
Second, to develop prototypes for some tasks involving aspects related with the language, the media and the domain of the document or documents to be summarized as well as aspects related with the summary content: generic, novelty, summaries, or summaries that give answer to a specific user need.
Third, to create an evaluation framework to analyze the performance of several approaches in written news and scientific oral presentation domains, focusing mainly in its intrinsic evaluation.
Date: 31st of March
Time: 9:00h
Place: Aula de Teleensenyament de l'edifici B3
Campus Nord.
Press Contact:
