Marie Curie Fellow at
Dipartimento di Informatica ed Applicazioni
University of Salerno, Italy
zsuzsa AT cebitec DOT uni-bielefeld DOT de or
zsuzsanna DOT liptak AT univr DOT it
Note: I have moved to University of Verona, Dept. of Computer Science, as assistant professor. I do not have a webpage there yet; for now please refer to my old one in Bielefeld.
We propose to investigate a number of algorithmic problems on jumbled strings, where we refer to a string t as a jumbled version of string s if t's positions can be permuted such that it is transformed into s. In other words, the two strings have the same Parikh vector, where the Parikh vector counts the number of occurrences of each character. For example, the strings AAGACGT and AAACGGT both have Parikh vector (3,1,2,1). All strings with the same Parikh vector build an equivalence class, which we refer to as a "jumbled string." We want to develop algorithms and dedicated data structures for searching, storing, comparing, and identifying jumbled strings.
Jumbled strings have important applications in bioinformatics, above all in interpretation of mass spectrometry data; but they have also been applied to alignment, pattern discovery in biological strings, or SNP detection. Searching for a jumbled pattern in a text constitutes a special case of approximate string matching, and is thus of particular interest in the pattern matching field. Similar problems regarding unique reconstruction of strings have been investigated in the area of formal languages.
The project involves both theoretical and practical parts. Besides searching for asymptotically optimal procedures for different models of the source which generates the text, we will also test on real instances of biological and textual data. We are not only interested in theoretically optimal algorithms but focus on algorithms that work well in practice. Thus, we consider also heuristics and ad hoc methods to enhance the practical implementation of our methods.
The project will enable the fellow to greatly enhance her competencies in algorithms development and formal languages, while training in information theory and extremal combinatorics, benefitting from the expertise at the host institution. This will constitute a major step in her career towards a professorship in algorithmic bioinformatics.
Keywords: algorithms and data structures, Parikh vectors, permuted strings, string algorithms, search algorithms, bioinformatics, pattern matching, string distance measures, word reconstruction