README This is a guide to using the files in the database. There are files from three samples: 1. GRIPseq experimental no. 1, ChIP'd with anti-Pol(II), sequences named N702- 2. GRIPseq experimental no. 2, ChIP'd with anti-Pol(II), sequences named N705- 3. GRIPseq control, ChIP'd with IgG alone, sequences named N703- For each sample there are 5 files: 1. raw read files, zipped fastqc files, all R1 reads in one file 2. raw read files, zipped fastqc files, all matching R2 files in a separate file 3. Bowtie-mapped reads, converted to a .bam file 4. an index of the mapped reads, .bai file 5. a read count file, 25bp window, .tdf file To get from the raw reads to the mapped reads, follow the following steps 1. Filter the reads for quality using the instructions in Minoche AE, Dohm JC, Himmelbauer H. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems. Genome Biol. 2011 > quality_passed.R1.fastqc file > quality_passed.R2.fastqc file 2. Using Bowtie2, map the quality-passed paired R1 and R2 fastqc files onto the S. Purpuratus genome, version 3.1, which can be downloaded from: http://www.spbase.org/SpBase/download/ 2a. Before using Bowtie2 you have to make an index of the genome you are aligning to. This gives the genome coordinates an index so the reads can then have the same index. The command is below, it creates a set of 6 files. > bowtie2-build genome_name.fasta index_prefix_name > for example: bowtie2-build Sea_Urchin_genome.fasta Sea_Urchin_ref_index_bowtie 2b. Run Bowtie2 on each sample using the following command > bowtie2 --phred33 --fr -I 100 -X 500 -p 8 --seed 123 -q -t -x Sea_Urchin_ref_index_bowtie -1 3. After Bowtie alignments are made, the resulting sam files need to be converted to bam files. They can then be loaded into a program like IGV for visualization 3a. this first command converts sam files to bam files using samtools > samtools view -bS #filenamehere#.sam > #filenamehere#.bam 3b. this second command filters the hits for read that map well (mapq score > 10) > samtools view -b -h -q 10 #filenamehere#.bam > #filenamehere#_filtered.bam 3c. this next command sorts the bam file > samtools sort #filenamehere#_filtered.bam #filenamehere#_filtered_sorted 3d. this last command indexes the sorted bam file, giving it coordinates, no output name needed. It will create the .bai file > samtools index #filenamehere#_filtered_sorted.bam 4. To make a read count file for visualization, one option is to use tools from a program like IGV from the Broad center: http://www.broadinstitute.org/igv/ 4a. First, create a .genome file, using version 3.1 of the sea urchin genome and the sea urchin transcriptome gene models: http://www.spbase.org/SpBase/download/ within the file labeled: gff3 files for 3.1 assembly (Build7) 4b. Load the .genome file. Load the sorted.bam file for all GRIPseq samples 4c. In IGV, go to Tools > Run igvtools. Run "Count" with a window size of 25, or adjust to a different window size. A .tdf file will be created. Load these count file alongside the sorted.bam file.