A summary of some of my own findings, ideas and work. A full list of the remaining publications where I have contributed.
The distribution of word frequencies in human language follows the
so-called Zipf's law. Zipf's law is a universal statistical
regularity. As far as we know, there is no exception among world
languages. I have formalized the needs of the hearer and the speaker
using information theory and shown how the Zipf's law is recovered
when those needs are optimized.
Visit
the excellent bibliography collection about Zipf's law in linguistic
and non-linguistic contexts by Wentian Li.
Related publications:
Ferrer i Cancho, R. (2005). On the universality of Zipf's law for word frequencies. In: "In honor of Gabriel Altmann", Grzybek, P. and Köhler,R. (eds.) 131-140.
Ferrer i Cancho, R. (2005). Hidden communication aspects in the exponent of Zipf's law. Glottometrics 11, 96-117.
Ferrer i Cancho, R. (2005). Zipf's law from a communicative phase transition. European Physical Journal B 47, 449-457.
Ferrer i Cancho, R. (2005). The variation of Zipf's law in human language. European Physical Journal B 44, 249-257.
Ferrer i Cancho, R. (2005). Decoding least effort and scaling in signal frequency distributions. Physica A 345, 275-284.
Ferrer i Cancho & Solé, R. V. (2003). Least effort and the origins of scaling in human language. Proc. Nat. Acad. Sci. USA. 100, 788-791. Clik here for a list of links related to this paper
Special section for people who are skeptical about the previous section "Why do human words follow Zipf's law?". Many people believe Zipf's law tells nothing about language. Some of them repeat what others have previously said and some others actually believe that random sequences of letters or other simple stochastic processes are valid null hypothesis (or even models) for human word frequencies. Read the following papers and open your mind:
Ferrer i Cancho, R. (2005). When language breaks into pieces. A conflict between communication through isolated signals and language Biosystems 84, 242-253.
Ferrer i Cancho, R., Riordan, O. & Bollobás, B.(2005). Zipf's law consequences for syntax and symbolic reference. Proceedings of the Royal Society of London Series B 272, 561-565.
Ferrer i Cancho, R. & Servedio, V. D. P. (2005). Can simple models explain Zipf's law in all cases? Glottometrics 11, 1-8.
Ferrer i Cancho, R. & Ricard V. Solé, (2002). Zipf´s law and random texts. Advances in Complex Systems 5, 1-6.
The frequency spectrum of words of single author texts typically follows a power distribution with an exponent -2. Large multiauthor texts exhibit two domains in the frequency spectrum with a different exponent each. The two domains divide the lexicon into a set of core words (the kernel) and a set of peripheral words. The core lexicon is a set of very basic words with a significant shorter length and semantic plasticity. Related publications:
Ferrer i Cancho, R. and Ricard V. Solé (2001). Two regimes in the frequency of words and the origin of complex lexicons. Journal of Quantitative Linguistics 8, 165-173.
Ferrer i Cancho, R. (2002). Core and outer lexicon through word length optimization. Submitted to the Journal of Quantitative Linguistics.
Borrowing tools from statistical physics, I show that the
syntactic dependency networks of different languages show striking
regularities.
The research line constitutes a new prospect for the
seek of linguistics universals. Related publications:
Ferrer i Cancho, R. & Solé, R. V. and Köhler, R. (2004). Patterns in syntactic dependency networks. Physical Review E 69, 051915.
Ferrer i Cancho, R. (2005). The structure of syntactic dependency networks: insights from recent advances in network theory. In: "The problems of quantitative linguistics: a collection of papers", Altmann, G., Levickij, V. & Perebyinis, V. (eds.). Chernivtsi: Ruta. pp. 60-75.
If you define the structure of a sentence drawing the syntactic
links above the sentence you will realize that links do not generally
cross. This is also a universal property. I have shown that the
absence of crossing is a side effect of minimizing the Euclidean
distance between linked words. The absence of crossings is a
necessary condition for the so-called projectivity property.
Related
publications:
Ferrer i Cancho, R. (2006). Why do syntactic links not cross? Europhysics Letters 76 1228-1235.
Ferrer i Cancho, R. (2004). The Eclidean distance between syntactically linked words. Physical Review E 70, 056135.
Languages that have subject (S), verb (V) and object (0) have six
possible ways of ordering the triple: SVO, SOV, OSV, OVS, OSV
and OVS. What could make SVO the most suitable candidate? Minimizing
the distance between words favors SVO under ideal conditions (66% of
the cases). My contribution here is defining the conditions of SVO
superiority.
Related publications (articles soon):
Ferrer i Cancho, R. (2003). Language: universals, principles and origins. PhD Thesis. Barcelona: Universitat Politècnica de Catalunya.
Schools and annotation manuals contradict when trying to define the structure of a particular sentence. The field needs non-arbitrary objective criteria in order to decide what the best option is. Here address the following question: if we have a triple of words (x,y,z) where x and y are content words and y is a preposition or a conjunction, what is the structure of the triple among the three possibilities: {(x,y),(y,z)}, {(x,y),(x,z)} or {(x,z),{y,z}}? Maximizing the information transfer leads to rejecting {(x,y),(y,z)} against what different syntactic dependency formalisms choose. Related publications:
Ferrer i Cancho, R. and Reina, F. (2002). Quantifying the semantic contribution of particles. Journal of Quantitative Linguistics, 9, 35-47.
What does happen to the topology of a network when both the distance between vertices and the number of links are minimized? A reduced network morphospace containing four basic network types is obtained. Those types are: exponential degree distribution networks, power degree distribution network, star networks and highly dense networks. The type of network obtained depends on the weight given to the vertex-vertex distance and number links. Related publications:
Ferrer i Cancho & Solé, R. V. (2003). Optimization in complex networks. Statistical Mechanics of Complex Networks, Lecture Notes in Physics Vol. 625, Springer (Berlin), pp 114.125.