Research snapshots

A summary of some of my own findings, ideas and work. A full list of the remaining publications where I have contributed.

Why do human words follow Zipf's law?

The distribution of word frequencies in human language follows the so-called Zipf's law. Zipf's law is a universal statistical regularity. As far as we know, there is no exception among world languages. I have formalized the needs of the hearer and the speaker using information theory and shown how the Zipf's law is recovered when those needs are optimized.
Visit the excellent bibliography collection about Zipf's law in linguistic and non-linguistic contexts by Wentian Li.

Related publications:

But is not Zipf's law in human word frequencies crap?

Special section for people who are skeptical about the previous section "Why do human words follow Zipf's law?". Many people believe Zipf's law tells nothing about language. Some of them repeat what others have previously said and some others actually believe that random sequences of letters or other simple stochastic processes are valid null hypothesis (or even models) for human word frequencies. Read the following papers and open your mind:

The core and peripheral lexicon hypothesis

The frequency spectrum of words of single author texts typically follows a power distribution with an exponent -2. Large multiauthor texts exhibit two domains in the frequency spectrum with a different exponent each.  The two domains divide the lexicon into a set of core words (the kernel) and a set of peripheral words. The core lexicon is a set of very basic words with a significant shorter length and semantic plasticity. Related publications:

New prospects for linguistics universals

Borrowing tools from statistical physics, I show that the syntactic dependency networks of different languages show striking regularities.
The research line constitutes a new prospect for the seek of linguistics universals. Related publications:

Why do syntactic links not cross?

If you define the structure of a sentence drawing the syntactic links above the sentence you will realize that links do not generally cross. This is also a universal property. I have shown that the absence of crossing is a side effect of minimizing the Euclidean distance between linked words. The absence of crossings is a necessary condition for the so-called projectivity property.

Related publications:

What could make SVO order superior?

Languages that have subject (S), verb (V) and object (0) have six possible ways of ordering the triple:  SVO, SOV, OSV, OVS, OSV and OVS. What could make SVO the most suitable candidate? Minimizing the distance between words favors SVO under ideal conditions (66% of the cases). My contribution here is defining the conditions of SVO superiority.

Related publications (articles soon):

Could syntactic dependency structures have a deep theoretical basis?

Schools  and annotation manuals contradict when trying to define the structure of a particular sentence. The field needs non-arbitrary objective criteria in order to decide what the best option is. Here address the following question: if we have a triple of words (x,y,z) where x and y are content words and y is a preposition or a conjunction, what is the structure of the triple among the three possibilities: {(x,y),(y,z)}, {(x,y),(x,z)} or {(x,z),{y,z}}? Maximizing the information transfer leads to rejecting {(x,y),(y,z)} against what different syntactic dependency formalisms choose. Related publications:

Optimization in complex networks

What does happen to the topology of a network when both the distance between vertices and the number of links are minimized? A reduced network morphospace containing four basic network types is obtained. Those types are: exponential degree distribution networks, power degree distribution network, star networks and highly dense networks. The type of network obtained depends on the weight given to the vertex-vertex distance and number links. Related publications: