Introduction to Volume 9 Issue 2
Steven I. Gordonpp. 1–1
A brief introduction to this issue of the Journal of Computational Science Education from the editor.
pp. 1–1
A brief introduction to this issue of the Journal of Computational Science Education from the editor.
pp. 2–13
https://doi.org/10.22369/issn.2153-4136/9/2/1@article{jocse-9-2-1, author={Rivka Taub and Michal Armoni and Mordechai (Moti) Ben-Ari}, title={Physics Conceptual Understanding in a Computational Science Course}, journal={The Journal of Computational Science Education}, year=2018, month=dec, volume=9, issue=2, pages={2--13}, doi={https://doi.org/10.22369/issn.2153-4136/9/2/1} }
Students face many difficulties dealing with physics principles and concepts during physics problem solving. For example, they lack the understanding of the components of formulas, as well as of the physical relationships between the two sides of a formula. To overcome these difficulties some educators have suggested integrating simulations design into physics learning. They claim that the programming process necessarily fosters understanding of the physics underlying the simulations. We investigated physics learning in a high-school course on computational science. The course focused on the development of computational models of physics phenomena and programming corresponding simulations. The study described in this paper deals with the development of students' conceptual physics knowledge throughout the course. Employing a qualitative approach, we used concept maps to evaluate students' physics conceptual knowledge at the beginning and the end of the model development process, and at different stages in between. We found that the students gained physics knowledge that has been reported to be difficult for high-school and even undergraduate students. We use two case studies to demonstrate our method of analysis and its outcomes. We do that by presenting a detailed analysis of two projects in which computational models and simulations of physics phenomena were developed.
pp. 14–22
https://doi.org/10.22369/issn.2153-4136/9/2/2@article{jocse-9-2-2, author={Qihua Chen and Jiangyan Feng and Shriyaa Mittal and Diwakar Shukla}, title={Automatic Feature Selection in Markov State Models Using Genetic Algorithm}, journal={The Journal of Computational Science Education}, year=2018, month=dec, volume=9, issue=2, pages={14--22}, doi={https://doi.org/10.22369/issn.2153-4136/9/2/2} }
Markov State Models (MSMs) are a powerful framework to reproduce the long-time conformational dynamics of biomolecules using a set of short Molecular Dynamics (MD) simulations. However, precise kinetics predictions of MSMs heavily rely on the features selected to describe the system. Despite the importance of feature selection for large system, determining an optimal set of features remains a difficult unsolved problem. Here, we introduce an automatic approach to optimize feature selection based on genetic algorithms (GA), which adaptively evolves the most fitted solution according to natural selection laws. The power of the GA-based method is illustrated on long atomistic folding simulations of four proteins, varying in length from 28 to 80 residues. Due to the diversity of tested proteins, we expect that our method will be extensible to other proteins and drive MSM building to a more objective protocol.
pp. 23–29
https://doi.org/10.22369/issn.2153-4136/9/2/3@article{jocse-9-2-3, author={Y. Daniel Liang}, title={Teaching and Learning Graph Algorithms Using Animation}, journal={The Journal of Computational Science Education}, year=2018, month=dec, volume=9, issue=2, pages={23--29}, doi={https://doi.org/10.22369/issn.2153-4136/9/2/3} }
Graph algorithms have many applications. Many real-world problems can be solved using graph algorithms. Graph algorithms are commonly taught in the data structures, algorithms, and discrete mathematics courses. We have created two animations to visually demonstrate the graph algorithms. The first animation is for depth-first search, breadth-first search, shortest paths, connected components, finding bipartite sets, and Hamiltonian path/cycle on unweighted graphs. The second animation is for the minimum spanning trees, shortest paths, travelling salesman problems on weighted graphs. The animations are developed using HTML, CSS, and JavaScript and are platform independent. They can be viewed from a browser on any device. The animations are useful tools for teaching and learning graph algorithms. This paper presents these animations.
pp. 30–36
https://doi.org/10.22369/issn.2153-4136/9/2/4@article{jocse-9-2-4, author={Alex Luke and Sarah Fergione and Riley Wilson and Brady Gunn and Stan Svojanovsky}, title={Identification of Active Oligonucleotide Sequences Using Artificial Neural Network}, journal={The Journal of Computational Science Education}, year=2018, month=dec, volume=9, issue=2, pages={30--36}, doi={https://doi.org/10.22369/issn.2153-4136/9/2/4} }
In this project we designed an Artificial Neural Network (ANN) computational model to predict the activity of short oligonucleotide sequences (octamers) with important biological role as exonic splicing enhancers (ESE) motifs recognized by human SR protein SC35. Since only active sequences were available from the literature as our initial data set, we generated an additional set of complementary sequences to the original set. We used back-propagation neural network (BPNN) with MATLAB® Neural Network Toolbox™ on our research designated computer. In Stage I of our project we trained, validated and tested the BPNN prototype. We started with 20 samples in the training and 8 samples in the validation sets. Trained and validated BPNN prototype was then used to test the unique set of 10 octamer sequences with 5 active samples and their 5 complementary sequences. The test showed 2 classification errors, one false positive and the other false negative. We used the test data and moved into Stage II of the project. First, we analyzed the initial DNA numerical representation (DNR) and changed the scheme to achieve higher difference between the subsets of active and complementary sequences. We compared the BPNN results with different numbers of nodes in the second hidden layer to optimize model accuracy. To estimate future model performance we needed to test the classifier on newly collected data from another paper. This practical application included the testing of 41 published, non-repeating SC35 ESE motif octamers, together with 41 complementary sequences. The test showed high BPNN accuracy in the predictive power for both (active and inactive) categories. This study shows the potential for using a BPNN to screen SC35 ESE motif candidates.
pp. 37–45
https://doi.org/10.22369/issn.2153-4136/9/2/5@article{jocse-9-2-5, author={Mariana Vasquez and Jonathon Mohl and Ming-Ying Leung}, title={Parsing Next Generation Sequencing Data in Parallel Environments for Downstream Genetic Variation Analysis}, journal={The Journal of Computational Science Education}, year=2018, month=dec, volume=9, issue=2, pages={37--45}, doi={https://doi.org/10.22369/issn.2153-4136/9/2/5} }
With the recent advances in next generation sequencing technology, analysis of prevalent DNA sequence variants from patients with a particular disease has become an important tool for understanding the associations between the disease and genetic mutations. A publicly accessible bioinformatics pipeline, called OncoMiner (http://oncominer.utep.edu), was implemented in 2016 to help biomedical researchers analyze large genomic datasets from patients with cancer. However, the current version of OncoMiner can only accept input files with a highly specific format for sequence variant description. In order to handle data from a broader range of sequencing platforms, a data preprocessing tool is necessary. We have therefore implemented the OncoMiner Preprocessing (OP) program for parsing data files in the popular FastQ and BAM formats to generate an OncoMiner input file. OP involves using the open source Bowtie2 and SAMtools software, followed by a python script we developed for genetic sequence variant identification. To preprocess very large datasets efficiently, the OP program has been parallelized on two local computers and the Blue Waters system at the National Center for Supercomputing Applications using a multiprocessing approach. Although reasonable parallelization efficiency has been obtained on the local computers, the OP program's speedup on Blue Waters has been limited, possibly due to I/O issues and individual node memory constraints. Despite these, Blue Waters has provided the necessary resources to process 35 datasets from patients with acute myeloid leukemia and demonstrated significant correlation of OP runtimes with the BAM input size and chromosome diversity.