Some research writings

Latest additions or updates (repeated below):

Links are grouped in overlapping categories; in a few of them I have not added any contribution for years now. These categories are:

Logical, intuitive, or practical notions of redundancy in association rules

Entailment among probabilistic implications (Atserias and Balcázar, accepted for LICS 2015).

Towards Parameter-Free Data Mining: Mining Educational Data with yacaree (Zorrilla, García-Saiz, and Balcázar, 4th International Conference on Educational Data Mining, 363-364). Will be made available here sometime in the future, probably in revised form.

Closed-set-based Discovery of Representative Association Rules Revisited (Balcázar, Tîrnăucă; Extraction et Gestion des Connaissances 2011). Undergoing major revision.

Mining Educational Data for Patterns With Negations and High Confidence Boost (Balcázar, Tîrnăucă, Zorrilla; TAMIDA workshop at CEDI 2010). Will be made available here sometime in the future, probably in revised form.

Filtering Association Rules with Negations on the Basis of Their Confidence Boost (Balcázar, Tîrnăucă, Zorrilla; KDIR 2010). Will be made available here sometime in the future, probably in revised form.

Formal and computational properties of the confidence boost in association rules, journal submission, extended version of some papers below on the topic of confidence boost. ACM Transactions on Knowledge Discovery from Data 2013 (final link to be added here soon).

Parameter-free association rule mining with yacaree (Extraction et Gestion des Connaissances 2011, demo track). Superseded by "Formal and computational properties..." above.

Closure-based Confidence Boost in Association Rules (Balcázar, Workshop on Applications of Pattern Analysis 2010). Superseded by "Formal and computational properties..." above.

Objective novelty of association rules: Measuring the confidence boost, (Balcázar, Extraction et Gestion des Connaissances 2010. Superseded by "Formal and computational properties..." above; a follow up of Two measures of objective novelty in association rule mining, Springer LNCS-5669, which is itself an extended version of Confidence Width: an Objective Measure of Novelty for Association Rules, Workshop on Quality Issues, Measures of Interestingness and Evaluation of Data Mining Models QIMIE'09 at PAKDD'09.

Redundancy, Deduction Schemes, and Minimum-Size Bases for Association Rules (Balcázar; Logical Methods in Computer Science, vol. 6 (2010), n. 2, paper 5; archival version joining the PKDD-ECML 2008 and DS 2008 papers below, with additional results);

Balcazar. Minimum-Size Bases of Association Rules. PKDD-ECML 2008. The journal paper in Logical Methods in Computer Science given above is an extended version joint with the DS'08 paper.

Balcazar. Deduction Schemes for Association Rules. Discovery Science 2008. The journal paper in Logical Methods in Computer Science given above is an extended version joint with the ECML PKDD'08 paper.

Closure Operators in Data Mining and Database Theory

Border Algorithms for Computing Hasse Diagrams of Arbitrary Lattices (Balcázar, Tîrnăucă; ICFCA 2011). Will be made available here sometime in the future.

Baixeries, Balcazar. Unified characterization of symmetric dependencies with lattices. International Conference in Formal Concept Analysis 2006.

Baixeries, Balcazar. New closure operators and lattice representations for multivalued dependencies and related expressions. Concept Lattices and Applications 2005.

Baixeries, Balcazar. A lattice representation of relations, multivalued dependencies and Armstrong relations. International Conference on Conceptual Structures, Kassel, july 2005. Please download beforehand Unified characterization of symmetric dependencies with lattices which includes a correction to one of the proofs; just in case you really want the ICCS version...

Baixeries, Balcazar. Characterization and Armstrong relations for degenerate multivalued dependencies using Formal Concept Analysis. International Conference in Formal Concept Analysis 2005.

Balcazar, Baixeries. Characterizations of multivalued dependencies and related expressions. International Conference on Discovery Science 2004.

Balcazar, Baixeries. Discrete Deterministic Data Mining as Knowledge Compilation. Workshop Discr Math and Data Mining at SIAM DM Conference, 2003.

Closure Operators on Structured Data

Mining Frequent Closed Rooted Trees (Balcázar, Bifet, Lozano, Machine Learning 78 (1-2): 1-33 (2010); archival version of four papers below (MLG06, ACKE07, ICCS07, and MLG07).

Balcazar, Bifet, Lozano. Mining implications from lattices of closed trees. Extraction et Gestion des Connaissances 2008.

Balcazar, Bifet, Lozano. Closed and maximal tree mining using natural representations. Mining and Learning with Graphs 2007. Better see the journal paper above.

Balcazar, Bifet, Lozano. Subtree Testing and Closed Tree Mining Through Natural Representations. Workshop Advances in Conceptual Knowledge Engineering, ACKE 2007, Regensburg. Better see the journal paper above.

Balcazar, Bifet, Lozano. Mining Frequent Closed Unordered Trees Through Natural Representations. International Conference on Conceptual Structures, Sheffield, july 2007. Better see the journal paper above.

Balcazar, Garriga. Characterizing implications of injective partial orders. This is the longish version; a short ``poster'' version was presented at International Conference on Conceptual Structures, Sheffield, july 2007.

Balcazar, Bifet, Lozano. Intersection algorithms and a closure operator on unordered trees. Mining and Learning with Graphs (MLG 2006, w/s at ECML/PKDD). Better see the journal paper above.

C-Garriga, Diaz-Lopez, Balcazar. Reconstructing the rules of 1D cellular automata using closure systems. European Conference on Complex Systems 2005.

Garriga, Diaz-Lopez, Balcazar. ISSA: An integrated system for sequence analysis.

Balcazar, Garriga. Horn Axiomatizations for Sequential Data. Theoretical Computer Science (available online 23 November 2006, special issue on International Conference in Database Theory 2005); vol 371, num 3 (2007), 247-264.

Garriga, Balcazar. Galois Connections for Mining Structured Objects. Learning 2004 (Elche, Spain).

Garriga, Balcazar. Coproduct Transformations on Lattices of Closed Partial Orders. International Conference on Graph Transformation 2004. Also as: http://eprints.pascal-network.org/archive/00000040.

Computational Learning Theory

Learning Propositional Horn Formulas from Closure Queries (Arias, Balcázar, and Tîrnăucă, submitted for publication).

Construction and learnability of canonical Horn formulas (Arias, Balcázar; Machine Learning, 2011; DOI 10.1007/s10994-011-5248-5). Expanded journal version of Canonical Horn Representations and Query Learning (Arias, Balcázar; ALT 2009).

Arias, Balcazar. Query Learning and Certificates in Lattices. ALT'08.

Balcazar. Query learning of Horn formulas revisited: progress and open questions. Computability in Europe, Amsterdam, june 2005.

Balcazar, Dai, Watanabe. A random sampling technique for training Support Vector Machines. ALT'2001.

Balcazar, Dai, Tanaka, Watanabe. Provably fast training algorithms for Support Vector Machines. Theory of Computing Systems (published online 13 February 2008), DOI: 10.1007/s00224-007-9094-6; includes the results from the three conference and workshop publications by Balcazar, Dai, and Watanabe linked above or elsewhere in this page.

Balcazar, Castro, Guijarro, Koebler, Lindner. A general dimension for query learning. Journal of Computer and System Sciences 73, 6 (2007), 924-940. Journal submission including the COLT'01 paper below and the follow-up by Koebler and Lindner appearing in ALT'02.

Balcazar, Castro, Guijarro. A general dimension for exact learning. COLT'01. Please download instead the journal version above.

Balcazar, Castro, Guijarro. A new abstract combinatorial dimension for exact learning via queries. Evolution of COLT'00. Journal of Computer and System Sciences 64, 1 (2002), 2--21.

Balcazar, Castro, Guijarro, Simon. The consistency dimension and distribution-dependent learning from queries. Evolution of ALT'99. Theoretical Computer Science 288, 2 (September 2002), 197--215. Will find it again and link it back here sometime in 2013.

Balcazar, Buhrman, Hermo: Circuit expressions of low Kolmogorov complexity. Follow-up and major overhaul of EuroColt 95.

Balcazar, Diaz, Gavalda, and Watanabe. Algorithms for Learning Finite Automata from Queries: A Unified View. A chapter in the book dedicated to the memory of Ron Book: Advances in Algorithms, Languages, and Complexity, D.-Z. Du and K.-I. Ko (eds.), Kluwer Academic Publishers, 1997. Not online here, but try Ricard's homepage.

Balcazar, Diaz, Gavalda, Watanabe. An optimal parallel algorithm for learning DFA (evolution of COLT 94). The hyperlink hopefully brings you the final version, online at the Journal of Universal Computer Science, vol. 2, n. 3 (march 28, 1996). Follow-up of a paper (not online) in ALT'93.

Castro, Balcazar. Simple PAC learning of simple decision lists (from ALT 95).

Data Mining Algorithmics, Models, and Techniques

Evaluation of Association Rule Quality Measures through Feature Extraction (Balcázar and Dogbey, IDA 2013).

Feature Extraction from Top Association Rules: Effect on Average Predictive Accuracy (Balcázar and Dogbey, 3rd EUCogIII Members Conference and Final Pascal Review, Palma de Mallorca, April 2013).

Balcazar, Dai, Tanaka, Watanabe. Provably fast training algorithms for Support Vector Machines. Theory of Computing Systems (published online 13 February 2008), DOI: 10.1007/s00224-007-9094-6; includes the results from the three conference and workshop publications by Balcazar, Dai, and Watanabe linked below.

Balcazar, Baixeries. Discrete Deterministic Data Mining as Knowledge Compilation. Workshop Discr Math and Data Mining at SIAM DM Conference, 2003.

Balcazar. Rules with bounded negations and the coverage inference scheme. Unpublished.

Balcazar, Dai, Watanabe. Provably fast Support Vector Regression using random sampling. Workshop Discr Math and Data Mining at SIAM DM Conference, 2002.

Baixeries, Casas-Garriga, Balcazar. A best-first strategy for finding frequent sets. Extraction des Connaissances et Apprentissage 1, 4 (2002), . Verbally presented at: Extraction et Gestion des Connaissances EGC'2002, Montpellier.

Balcazar, Dai, Watanabe. Provably fast training algorithms for Support Vector Machines. ICDM'2001. Follow-up of ALT'2001.

Balcazar, Dai, Watanabe. A random sampling technique for training Support Vector Machines. ALT'2001.

Fortes, Balcazar, Morales. Bounding negative information in frequent sets algorithms. Discovery Science DS'2001. Try better the final version of the algorithms in the PhD dissertation of Inma Fortes at University of Malaga.

Complexity Theory

Balcazar, Buhrman, Hermo. Circuit expressions of low Kolmogorov complexity. Follow-up and major overhaul of EuroColt 95.

Balcazar, Hermo. The structure of logarithmic advice complexity classes. Theoretical Computer Science 207, 1 (1998), "In memoriam of Ronald V. Book", 217-244. Based on a part of the IFIP 92 paper by Balcazar, Hermo, and Mayordomo, and most of the MFCS 94 paper by Hermo (including corrections to some not fully correct claims there). Core chapter of Hermo's PhD.

Balcazar, Gavalda, and Watanabe. Coding Complexity: The Computational Complexity of Succinct Descriptions. A chapter in the book dedicated to the memory of Ron Book: Advances in Algorithms, Languages, and Complexity, D.-Z. Du and K.-I. Ko (eds.), Kluwer Academic Publishers, 1997. Not online here, but try Ricard's homepage.

Atserias, Balcazar. Refining logical characterizations of advice complexity classes. Panhelenic Logic Symposium, Cyprus, july 97.

Balcazar, Gavalda, Siegelmann. Computational power of neural networks: A Kolmogorov complexity characterization. IEEE Transactions on Information Theory 43, 4 (1997), 1175-1183. (I am not sure it is the really final version.)

Allender, Balcazar, and Immerman. A first-order isomorphism theorem. SIAM Journal on Computing 26, 2 (1997), 539--556. Eric's version resides in the US, in case you prefer.

Balcazar, Gavalda, Hermo. Compressibility of infinite binary sequences. Published in: Complexity, Logic, and Recursion Theory, A. Sorbi, ed., Marcel Dekker 1997, ISBN 0-8247-0026-0, 75-91. Supersedes the "easy as \pi" paper of the Workshop on Descriptional Complexity, Rutgers 1994.

Balcazar. The complexity of searching implicit graphs (evolution of ICALP 95). Artificial Intelligence 86 (1996), 171-188.

Balcazar, Mayordomo. A note on genericity and bi-immunity (from Structure in Complexity 95); it's Elvira's version, use gunzip and find some good luck around.

Others: Security in computing, teaching informatics, programming methodologies

Garriga, Balcazar, Ballesteros, Megias, Palomares. Strategic Management of Security Information through an Entropy-Based Alert Correlator (newsletter publication).

Sanchez, Sancho, Botella, Garcia, Aluja, Navarro, Balcazar. Competencias Profesionales del Grado en Ingenieria Informatica (JENUI'08 conference on teaching informatics).

Balcazar. Yet another transformation scheme for double recursion. Unpublished.


-----------------------------117013131620671 Content-Disposition: form-data; name="file2"; filename="" Content-Type: application/octet-stream