Dependencies within CTCF binding sites by Ivo Grosse, Institute of Computer Science, Martin Luther University, Halle, Germany The identification of DNA binding sites has been a challenge since the early days of computational biology, and its importance has been increasing with the development of new experimental techniques and the ensuing flood of large-scale genomics and epigenomics data yielding approximate regions of binding.  The question to which extent dependencies within binding sites exist and how much they help in the computational identification of DNA binding sites has been debated intensively, in particular in the context of analyzing in-vitro data from universal protein-binding microarrays.  Here, we address this question based on in-vivo data from different ChIP-seq experiments of the human enhancer-blocking insulator protein CTCF, and we find that the sensitivity of de-novo motif discovery increases from 72% to 84% by taking into account dependencies within CTCF binding sites of up to order four.  We find that CTCF binding sites are poorly represented by a position weight matrix model, which neglects dependencies within binding sites, and that these dependencies are particularly strong in the unconserved region at the 3'end of the CTCF motif.