Introduction to Volume 15 Issue 2
David JoinerVolume 15, Issue 2 (November 2024), pp. 1–1
A brief introduction to this issue of the Journal of Computational Science Education from the editor.
Volume 15, Issue 2 (November 2024), pp. 1–1
A brief introduction to this issue of the Journal of Computational Science Education from the editor.
Volume 15, Issue 2 (November 2024), pp. 2–4
https://doi.org/10.22369/issn.2153-4136/15/2/1Data science continues to create opportunities in the technology and HPC industry resulting from growing data sets, the need for more insights, the necessity of automation, the evolving roles and changes in job descriptions as those positions are needed and the shortage in the workforce with this talent. However, despite the growing demand, not enough students are learning the basic skills or being able to be given opportunities for hands-on work. In the Northern California Community College system, many of the students return to school after having graduated with a bachelor's degree or find the need to gain new skills to enhance their resume or to change careers altogether. Unfortunately, in the community colleges, there are not enough classes or instructors who are trained in data science to teach the class. In the four-year university, the program is usually waitlisted for transfer students from the community college. This paper is a continuation of the work after the National Energy Research Scientific Computing Center (NERSC) partnered with Laney College to start a Data Analytics program. After two years, they are challenged with not enough instructors to the number of students that are interested in the program. Further, approximately 40% of students are struggling to continue the rigorous material they need to learn. These students may have to work to support families and are unable to put in the 20-40 hours of work to earn a living as well as the 20-40 hours of study and homework that the program requires. Therefore, Laney partnered with Codefinity, an online education program that has a track for Python Data Analysis and Visualization.
Volume 15, Issue 2 (November 2024), pp. 5–9
https://doi.org/10.22369/issn.2153-4136/15/2/2As training on cyberinfrastructure resources becomes more common, we show the progression of metrics used to measure the effectiveness and impact of informal computational training courses that are provided by the Texas A&M University High Performance Research Computing facility. These courses were built to support researchers from research groups that have a background in computing practices. As such, the courses were structured as information-sharing sessions with the primary method to measure course success being frequency of participation. While these metrics inform about the interest in these courses, they relied on researchers continuing the learning process in their laboratories. As computing becomes ubiquitous in research programs, researchers who have no peer-learning mechanisms participate in these courses. Researchers are now participating in a continuum of courses that cover introductory to advanced topics and rely on them to build proficiency in research computing technologies. We report on a pilot program that pivots along the way to support these researchers.
Volume 15, Issue 2 (November 2024), pp. 10–15
https://doi.org/10.22369/issn.2153-4136/15/2/3The new strategic framework of Wake Forest University seeks to build and strengthen signature areas of excellence in research, scholarship, and creative work that cross academic and institutional boundaries. To support this initiative, the High Performance Computing (HPC) Team has developed an Introduction to High Performance Computing undergraduate course that is accessible to students of all levels and of all academic domains. The objective of this course is to build a curriculum that presents HPC as an essential tool for research and scholarship, enables student-faculty collaboration across all disciplines, and promotes student participation in academic research during their undergraduate studies.
Volume 15, Issue 2 (November 2024), pp. 16–23
https://doi.org/10.22369/issn.2153-4136/15/2/4High performance computing (HPC) is a crucial field in science and engineering. Although HPC is often viewed as a pure field of computer science or a subset of it, it actually serves as a tool that enables us to achieve exceptional results in science and engineering. Since early on, computers have been primarily utilized for extensive arithmetic computations. However, recent advancements in electronics have also made edge computing integral to high performance computing. Additionally, we have witnessed remarkable growth in computer architecture, leading to the development of powerful HPC machines, with supercomputers now reaching exaflop powers. Nevertheless, there are still challenges in utilizing these powerful machines due to the lack of knowledge in integrating physics and mathematics into HPC. Furthermore, complications with the software stack and common parallel programming models that target exascale computing (heterogeneous computing) persist. In this context, we present our effective course design for HPC training, focusing on CUDA, OpenACC, and OpenMP courses, which aim to equip STEM graduates with HPC knowledge. We also discuss how our training stands out in comparison to other NCC training frameworks in the EuroCC context and promotes lifelong learning.
Volume 15, Issue 2 (November 2024), pp. 24–28
https://doi.org/10.22369/issn.2153-4136/15/2/5Computation is a significant part of the work done by many practicing scientists, yet it is not universally taught from a scientific perspective in undergraduate science departments. In response to the need to provide training in scientific computation to our students, we developed a suite of self-paced 'modules' in the form of Jupyter notebooks using Python. These modules introduce the basics of Python programming and present a wide variety of scientific applications of computing, ranging from numerical integration and differentiation to Fourier analysis, Monte Carlo methods, parallel processing, and machine learning. 1 The modules contain multiple features to promote learning, including 'Breakpoint Questions,' recaps of key information, self-reflection prompts, and exercises.
Volume 15, Issue 2 (November 2024), pp. 29–39
https://doi.org/10.22369/issn.2153-4136/15/2/6In the age of advanced open-source artificial intelligence (AI) and a growing demand for software tools, programming skills are as important as ever. For even the most experienced programmers, it can be challenging to determine which software libraries and packages are best suited to fit specific programming needs. To investigate the potential of AI-supported learning, this case study explores the use of OpenAI’s ChatGPT, powered by GPT-3.5 and GPT-4, by students to create an image annotation graphical user interface (GUI) in Python. This task was selected because good User Experience (UX) design is a deceptively complex task in that it can be very easy to build a GUI interface but extremely hard to build one that is well designed. The approaches employed in this study included creating a program from scratch that integrates the listed features incrementally; compiling a list of essential features and requesting ChatGPT to modify existing code accordingly; collaborating on specific segments of a user-initiated program; and creating a program anew using GPT-4 for comparative analysis. The findings of this case study indicate that ChatGPT is optimally utilized for responding to precise queries rather than creating code from scratch. Effective use of ChatGPT requires a foundational understanding of programming languages. As a learning tool, ChatGPT can help a novice programmer create competent initial drafts, akin to what one might expect from an introductory programming course, yet they necessitate substantial modifications for deployment of the tool even as a prototype.
Volume 15, Issue 1 (March 2024), pp. 1–1
A brief introduction to this issue of the Journal of Computational Science Education from the editor.
Volume 15, Issue 1 (March 2024), pp. 2–9
https://doi.org/10.22369/issn.2153-4136/15/1/1The Data-Enabled Advanced Computational Training Program for Cybersecurity Research and Education (DeapSECURE) is a nondegree training consisting of six modules covering a broad range of cyberinfrastructure techniques, including high performance computing, big data, machine learning and advanced cryptography, aimed at reducing the gap between current cybersecurity curricula and requirements needed for advanced research and industrial projects. Since 2020, these lesson modules have been updated and retooled to suit fully-online delivery. Hands-on activities were reformatted to accommodate self-paced learning. In this paper, we summarize the four years of the project comparing in-person and online only instruction methods as well as outlining lessons learned. The module content and hands-on materials are being released as open-source educational resources. We also indicate our future direction to scale up and increase adoption of the DeapSECURE training program to benefit cybersecurity research everywhere.
Volume 15, Issue 1 (March 2024), pp. 10–12
https://doi.org/10.22369/issn.2153-4136/15/1/2In this study, we investigate the performance of several regression models by utilizing a database of dielectric constants. First, the database is processed using the Matminer Python library to create features, and then divided into training, validation, and testing subsets. We evaluate several models: Linear Regression, Random Forest, Gradient Boosting, XGBoost, Support Vector Regression, and Feedforward Neural Network, with the objective of predicting the bandgap values. The results indicate superior performance of tree-based ensemble models over Linear Regression and Support Vector Regression. Additionally, a Feedforward Neural Network with two hidden layers demonstrates comparable proficiency in capturing the relationship between the features generated by Matminer and the bandgap target values.
Volume 15, Issue 1 (March 2024), pp. 13–14
https://doi.org/10.22369/issn.2153-4136/15/1/3The convergence of quantum technologies and high-performance computing offers unique opportunities for research and algorithm development, demanding a skilled workforce to harness the quantum systems' potential. In this lightning talk, we address the growing need to train experts in quantum computing and explore the challenges in training these individuals in quantum computing, including the abstract nature of quantum theory, or the focus on specific frameworks. To overcome these obstacles, we propose selfguided learning resources that offer interactive learning experiences and practical framework-independent experimentation for different target audiences.
Volume 15, Issue 1 (March 2024), pp. 15–22
https://doi.org/10.22369/issn.2153-4136/15/1/4High-performance computing (HPC) is an important tool for research, development, and the industry. Moreover, with the recent expansion of machine learning applications, the need for HPC is increasing even further. However, in developing countries with limited access to the HPC ecosystem, the lack of infrastructure, expertise, and access to knowledge represents a major obstacle to the expansion of HPC. Under these constraints, the adoption of HPC by communities presents several challenges. The HPC Summer Schools are an initiative of CyberColombia that has taken place over the past 5 years. It aims to develop the critical skills, strategic planning, and networking required to make available, disseminate, and maintain the knowledge of high-performance computing and its applications in Colombia. Here we report the results of this series of Summer Schools. The events have proven to be successful, with over 200 participants from more than 20 institutions. Participants span different levels of expertise, including undergraduate and graduate students as well as professionals. We also describe successful use cases for HPC cloud solutions, namely Chameleon Cloud.
Volume 15, Issue 1 (March 2024), pp. 23–31
https://doi.org/10.22369/issn.2153-4136/15/1/5Giving students a good understanding how micro-architectural effects impact achievable performance of HPC workloads is essential for their education. It enables them to find effective optimization strategies and to reason about sensible approaches towards better efficiency. This paper describes a lab course held in collaboration between LRZ, LMU, and TUM. The course was born with a dual motivation in mind: filling a gap in educating students to become HPC experts, as well as understanding the stability and usability of emerging HPC programming models for recent CPU and GPU architectures with the help of students. We describe the course structure used to achieve these goals, resources made available to attract students, and experiences and statistics from running the course for six semesters. We conclude with an assessment of how successfully the lab course met the initially set vision.
Volume 15, Issue 1 (March 2024), pp. 32–34
https://doi.org/10.22369/issn.2153-4136/15/1/6The HPC Carpentry lesson program is a highly interactive, handson approach to getting users up to speed on HPC cluster systems. It is motivated by the increasing availability of cluster resources to a wide range of user groups, many of whom come from communities that have not traditionally used HPC systems. We adopt the Carpentries approach to pedagogy, which consists of a workshop setting where learners type along with instructors while working through the instructional steps, building up 'muscle memory' of the tasks, further reinforced by challenge exercises at critical points within the lesson. This paper reviews the development of the HPC Carpentry Lesson Program as it becomes the first entrant into phase 2 of The Carpentries Lesson Program Incubator. This incubator is the pathway for HPC Carpentry to become an official lesson program of The Carpentries.
Volume 15, Issue 1 (March 2024), pp. 35–40
https://doi.org/10.22369/issn.2153-4136/15/1/7We have developed a series of course-based undergraduate research experiences for students integrated into course curriculum centered around the use of 3D visualization and virtual reality for science visualization. One project involves the creation and use of a volumetric renderer for hyperstack images, paired with a biology project in confocal microscopy. Students have worked to develop and test VR enabled tools for confocal microscopy visualization across headset based and CAVE based VR platforms. Two applications of the tool are presented: a rendering of Drosophila primordial germ cells coupled with automated detection and counting, and a database in development of 3D renderings of pollen grains. Another project involves the development and testing of point cloud renderers. Student work has focused on performance testing and enhancement across a range of 2D and 3D hardware, including native Quest apps. Through the process of developing these tools, students are introduced to scientific visualization concepts, while gaining practical experience with programming, software engineering, graphics, shader programming, and cross-platform design.
Volume 15, Issue 1 (March 2024), pp. 41–46
https://doi.org/10.22369/issn.2153-4136/15/1/8Throughout the cyberinfrastructure community there are a large range of resources available to train faculty and young scholars about successful utilization of computational resources for research. The challenge that the community faces is that training materials abound, but they can be difficult to find, and often have little information about the quality or relevance of offerings. Building on existing software technology, we propose to build a way for the community to better share and find training and education materials through a federated training repository. In this scenario, organizations and authors retain physical and legal ownership of their materials by sharing only catalog information, organizations can refine local portals to use the best and most appropriate materials from both local and remote sources, and learners can take advantage of materials that are reviewed and described more clearly. In this paper, we introduce the HPC ED pilot project, a federated training repository that is designed to allow resource providers, campus portals, schools, and other institutions to both incorporate training from multiple sources into their own familiar interfaces and to publish their local training materials.
Volume 15, Issue 1 (March 2024), pp. 47–48
https://doi.org/10.22369/issn.2153-4136/15/1/9The 'Understanding the Skills and Pathways Behind Research Software Training' BoF session run at ISC'23 provided an opportunity to bring together a group of attendees interested in approaches to enhance skills within the Research Software Engineering community. This included looking at options for understanding and developing pathways that practitioners can follow to develop their skills and competencies in a structured manner from beginner to advanced level. Questions discussed included: How can we highlight the existence of different training opportunities and ensure awareness and uptake? What materials already exist and what’s missing? How do we navigate this largely undefined landscape? In short: how does one train to become an RSE? One of the interactive parts of this session was based around a live, anonymous survey. Participants were asked a number of questions ranging from their role in the training community to how easy they feel it is to find/access training content targeting different skill levels. They were also asked about challenges faced in accessing relevant content, combining it into a coherent pathway, and linking training content from different sources. Other questions focused on discoverability of material and skills that are most commonly overlooked. The number of respondents and responses varied between questions, with 24 to 50 participants engaging and providing 32 to 59 replies. The goal of this lightning talk is to present findings, within the context of the community wide effort to make the training materials more FAIR - findable, accessible, interoperable and reusable.
Volume 15, Issue 1 (March 2024), pp. 49–56
https://doi.org/10.22369/issn.2153-4136/15/1/10The U.S. Department of Energy (DOE) is a long-standing leader in research and development of high-performance computing (HPC) in the pursuit of science. However, we face daunting challenges in fostering a robust and diverse HPC workforce. Basic HPC is not typically taught at early stages of students' academic careers, and the capacity and knowledge of HPC at many institutions are limited. Even so, such topics are prerequisites for advanced training programs, internships, graduate school, and ultimately for careers in HPC. To help address this challenge, as part of the DOE Exascale Computing Project's Broadening Participation Initiative, we recently launched the Introduction to HPC Training and Workforce Pipeline Program to provide accessible introductory material on HPC, scalable AI, and analytics. We describe the Intro to HPC Bootcamp, an immersive program designed to engage students from underrepresented groups as they learn foundational HPC skills. The program takes a novel approach to HPC training by turning the traditional curriculum upside down. Instead of focusing on technology and its applications, the bootcamp focuses on energy justice to motivate the training of HPC skills through project-based pedagogy and real-life science stories. Additionally, the bootcamp prepares students for internships and future careers at DOE labs. The first bootcamp, hosted by the advanced computing facilities at Argonne, Lawrence Berkeley, and Oak Ridge National Labs and organized by Sustainable Horizons Institute, took place in August 2023.
Volume 15, Issue 1 (March 2024), pp. 57–58
https://doi.org/10.22369/issn.2153-4136/15/1/11The Cross-Institutional Research Engagement Network (CIREN) is a collaborative project between the University of Tennessee, Knoxville (UTK) and Arizona State University (ASU). This project's purpose is to fill critical gaps in the development and retention of cyberinfrastructure (CI) facilitators via training, mentorship, and research engagement. Research engagements include projects at the CI facilitator's local institution, between CIREN partner institutions, and through NSF's ACCESS program. This lightning talk will detail the training curriculum and mentorship activities the project has implemented in its first year as well as plans for its future research engagements. Feedback is welcome from the community with respect to project directions, best practices, and challenges experienced in implementing this or similar programs at academic institutions.
Volume 15, Issue 1 (March 2024), pp. 59–63
https://doi.org/10.22369/issn.2153-4136/15/1/12Students in community colleges are either interested in a quick degree or a skill that allows them to hop onto a career area while minimizing debt. Attending a four-year university can be a challenge for financial costs or academic reasons, and acceptance can be competitive. Today's job market is challenging in hiring and retaining diverse staff. More so within the High Performance Computing (HPC) or a government laboratory. Industry offers higher salaries, potentially better benefits, or opportunities for remote work, factors that contribute to the challenge of attracting talent. At the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory, site reliability engineers manage the HPC data center onsite 24x7. The facility is a unique and complex ecosystem that needs to be monitored in addition to the normal areas such as the computational systems, the three-tier storage, the supporting infrastructure, the network and cybersecurity. Effective monitoring requires the understanding of data collected from the heterogeneous sources produced by the systems and facility. With so much data, it is much easier to view the data in graphic format and NERSC uses Grafana to display their data. To encourage interest in HPC, NERSC partnered with Laney College to create a Data Analytics Program. Once Laney faculty learns how to teach the classes toward a certificate program, they fill a need for their students to build the skill in data analytics toward a career or to continue toward a fouryear degree as transfer students. This also fills a gap where the nearby four-year university has a long waitlist. This paper describes how NERSC partners with to create a pipeline toward a data analytics career.
Volume 15, Issue 1 (March 2024), pp. 64–71
https://doi.org/10.22369/issn.2153-4136/15/1/13In this paper, we present an approach to hands-on High Performance Computing (HPC) System Administrator training that is not reliant on high performance computing infrastructure. We introduce a scalable, standalone virtual 3-node OpenHPC-based training lab designed for Resource Constrained Environments (RCE’s) that runs on a participant's local computer. We describe the technical components and implementation of the virtual HPC training lab and address the principles and best practices considered throughout the design of the training material.
Volume 14, Issue 2 (November 2023), pp. 1–1
A brief introduction to this issue of the Journal of Computational Science Education from the editor.
Volume 14, Issue 2 (November 2023), pp. 2–5
https://doi.org/10.22369/issn.2153-4136/14/2/1Today's job market has its challenges in gaining proficient staff but more so in the High Performance Computing area and within a government lab. Competition from industry in terms of the type of perks they provide, being able to negotiate a higher salary and opportunities of remote work all play a part in losing candidates. At the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory (LBNL), a site reliability engineer manages the data center onsite 24x7. Further, the facility itself is a unique and complex ecosystem that uses evaporative cooling and recycling of hot air to keep the facility cool. This is in addition to the normal areas to be monitored like the computational systems, the three tier storage, as well as infrastructure and cybersecurity. To explore creating interest into HPC and STEM within the disadvantaged communities near the Laboratory, NERSC partnered with a community college during the pandemic to support high school seniors and freshmen students to provide an educational foundation. In collaboration with the community college, they created a program of specific classes that students needed to take to prepare them for an HPC and/or STEM internships. In certain demographics, students do not believe they can be successful in science or math and require support from the program such as tutors to help them through. With this type of support, students have successfully completed their classes with passing grades. As part of their recruitment process for site reliability engineers to continue to support diversity initiatives at the Laboratory, NERSC implemented an apprenticeship program. This paper describes the current work that includes partnering with a community college program and then NERSC provides a summer internship for the student so they can gain hands-on experience. The first cohort of students have graduated into their internship programs this summer. This paper demonstrates early results from this partnership and how it has impacted the diverse pool of candidates at NERSC.
Volume 14, Issue 2 (November 2023), pp. 6–9
https://doi.org/10.22369/issn.2153-4136/14/2/2Computing programs for secondary school students are rapidly becoming a staple at High Performance Computing (HPC) centers and Computer Science departments around the country. Developing curriculum that targets specific computing subfields with unmet needs remains a challenge. Here, we report on developments in the two week Summer Computing Academy (SCA) to focus on two such subfields. During the first week, 'Computing for a Better Tomorrow: Data Sciences,' introduced students to real-life applications of big data processing. A variety of topics were covered, including genomics and bioinformatics, cloud computing, and machine learning. During the second week, 'Camp Secure: Cybersecurity,' focused on issues related to principles of cybersecurity. Students were taught online safety, cryptography, and internet structure. The two weeks are unified by a common thread of Python programming. Modules from the SCA program may be implemented at other institutions with relative ease and promote cybertraining efforts nationwide.
Volume 14, Issue 2 (November 2023), pp. 10–17
https://doi.org/10.22369/issn.2153-4136/14/2/3End users of remote computing systems are frequently not aware of basic ways in which they could enhance protection against cyberthreats and attacks. In this paper, we discuss specific techniques to help and train users to improve cybersecurity when using such systems. To explain the rationale behind these techniques, we go into some depth explaining possible threats in the context of using remote, shared computing resources. Although some of the details of these prescriptions and recommendations apply to specific use cases when connecting to remote servers, such as a supercomputer, cluster, or Linux workstation, the main concepts and ideas can be applied to a wider spectrum of cases.
Volume 14, Issue 2 (November 2023), pp. 18–22
https://doi.org/10.22369/issn.2153-4136/14/2/4This paper shares the results of a survey conducted October- November 2022. The survey's intent was to learn how the community both shares and discovers training and education materials, whether those needs were being met, and if there were interest in improving how materials are shared. The survey resulted in 112 responses primarily from content authors who are, or support, academics. While the majority of respondents considered themselves successful in finding materials, most also encountered barriers, such as finding materials, but not at the needed depth or level. Most respondents were both interested in, and able to, work toward community efforts to improve finding materials, with most citing lack of staff time as a barrier to doing so. Proposed efforts in community engagement to work toward these efforts are discussed.
Volume 14, Issue 2 (November 2023), pp. 23–27
https://doi.org/10.22369/issn.2153-4136/14/2/5In response to an increasing demand for digital skills in industry and academia, a series of credentialed short courses that cover a variety of topics related to high performance computing were designed and implemented to enable university students and researchers to effectively utilize research computing resources and bridge the gap for users with educational backgrounds that do not include computational training. The courses cover a diverse array of topics, including subjects in programming, cybersecurity, artificial intelligence/ machine learning, bioinformatics, and cloud computing. The courses are designed to enable the students to apply the skills they learn to their own research that incorporates use of large-scale computing systems. These courses offer advantages to generic online courses in that they teach computing skills relevant to academic research programs. Finally, the micro-credentials obtained from these courses are transcriptable, may be stacked with existing degree programs and credit-bearing courses to create a larger degree plan, and offer a meaningful mechanism of adding to a student's resume.
Volume 14, Issue 2 (November 2023), pp. 28–33
https://doi.org/10.22369/issn.2153-4136/14/2/6The challenges of HPC education span a wide array of targeted applications, ranging from developing a new generation of administrators and facilitators to maintain and support cluster resources and their respective user communities, to broadening the impact of HPC workflows by reaching non-traditional disciplines and training researchers in the best-practice tools and approaches when using such systems. Furthermore, standard x86 and GPU architectures are becoming untenable to scale to the needs of computational research, necessitating software and hardware co-development on less-familiar processors. While platforms such as Cerebras and SambaNova have matured to include common frameworks such as TensorFlow and PyTorch as well as robust APIs, and thus are amenable to production use cases and instructional material, other systems may lack such infrastructure maturity, impeding all but the most technically inclined developers from being able to leverage the system. We present here our efforts and outcomes of providing a codevelopment and instructional platform for the Lucata Pathfinder thread-migratory system in the Rogues Gallery at Georgia Tech. Through a collection of user workflow management, co-development with the platform’s engineers, community tutorials, undergraduate coursework, and student hires, we have been able to explore multiple facets of HPC education in a unique way that can serve as a viable template for others seeking to develop similar efforts.
Volume 14, Issue 2 (November 2023), pp. 34–37
https://doi.org/10.22369/issn.2153-4136/14/2/7A joint proof-of-concept project between Arizona State University and CR8DL, Inc., deployed a Jupyter-notebook based interface to datacenter resources for a computationally intensive, semesterlength biochemistry course project. Facilitated for undergraduate biochemistry students with limited high-performance computing experience, the straightforward interface allowed for large scale computations. As the project progressed, various enhancements were identified and implemented.
Volume 14, Issue 1 (July 2023), pp. 1–1
A brief introduction to this issue of the Journal of Computational Science Education from the editor.
Volume 14, Issue 1 (July 2023), pp. 8–16
https://doi.org/10.22369/issn.2153-4136/14/1/2The fate and transport of dissolved constituents in porous media has important applications in the earth and environmental sciences and many engineering disciplines. Mathematical models are commonly applied to simulate the movement of substances in porous media using the advection-dispersion equation. Whereas computer programs based on numerical solutions are commonly employed to solve the governing equations for these problems, analytical solutions also exist for some important one-dimensional cases. These solutions are often still quite complex to apply in practice, and therefore computational tools are still needed to apply them to determine the concentrations of dissolved substances as a function of space and time. The Python Programming Language provides a variety of tools that enable implementation of analytical solutions into useful tools and facilitate their application to experimental data. Python provides an important but underutilized tool in environmental modeling courses. This article highlights the development of a series of Python-based computing tools that can be used to numerically compute the values of an analytical solution to the onedimensional advection-dispersion equation. These tools are targeted to graduate and advanced undergraduate courses that teach environmental modeling and the application of Python for computing.
Volume 14, Issue 1 (July 2023), pp. 17–22
https://doi.org/10.22369/issn.2153-4136/14/1/3Given the anticipated growth of the high-performance computing market, HPC is challenged with expanding the size, diversity, and skill of its workforce while also addressing post-pandemic distributed workforce protocols and an ever-expanding ecosystem of architectures, accelerators, and software stacks. As we move toward exascale computing, training approaches need to address how to best prepare future computational scientists and enable established domain researchers to stay current and master tools needed for exascale architectures. This paper explores adding hybrid and virtual Hackathons to the training mix to bridge traditional programming curricula and hands-on skills needed among diverse communities. We outline current learning and development programs available; explain the benefits and challenges in implementing hackathons for training using experience gained from the Open Hackathons program (formerly the GPU Hackathons program); discuss how to engage diverse communities—from early career researchers to veteran scientists; and recommend best practices for implementing these events.
Volume 14, Issue 1 (July 2023), pp. 23–30
https://doi.org/10.22369/issn.2153-4136/14/1/4Researchers and developers in a variety of fields have benefited from the massively parallel processing paradigm. Numerous tasks are facilitated by the use of accelerated computing, such as graphics, simulations, visualisations, cryptography, data science, and machine learning. Over the past years, machine learning and in particular deep learning have received much attention. The development of such solutions requires a different level of expertise and insight than that required for traditional software engineering. Therefore, there is a need for novel approaches to teaching people about these topics. This paper outlines the primary challenges of accelerated computing and deep learning education, discusses the methodology and content of the NVIDIA Deep Learning Institute, presents the results of a quantitative survey conducted after full-day workshops, and demonstrates a sample adoption of DLI teaching kits for teaching heterogeneous parallel computing.
Volume 14, Issue 1 (July 2023), pp. 31–40
https://doi.org/10.22369/issn.2153-4136/14/1/5We propose a modified MSA algorithm on quantum annealers with applications in areas of bioinformatics and genetic sequencing. To understand the human genome, researchers compare extensive sets of these genetic sequences – or their protein counterparts – to identify patterns. This comparison begins with the alignment of the set of (multiple) sequences. However, this alignment problem is considered nondeterministically-polynomial time complete and, thus, current classical algorithms at best rely on brute force or heuristic methods to find solutions. Quantum annealing algorithms are able to bypass this need for sheer brute force due to their use of quantum mechanical properties. However, due to the novelty of these algorithms, many are rudimentary in nature and limited by hardware restrictions. We apply progressive alignment techniques to modify annealing algorithms, achieving a linear reduction in spin usage whilst introducing more complex heuristics to the algorithm. This opens the door for further exploration into quantum computing-based bioinformatics, potentially allowing for a deeper understanding of disease detection and monitoring.
Volume 14, Issue 1 (July 2023), pp. 41–45
https://doi.org/10.22369/issn.2153-4136/14/1/6Delivering training and education on hybrid technologies (including AI, ML, GPU, Data and Visual Analytics including VR and Quantum Computing) integrated with HPC resources is key to enable individuals and businesses to take full advantage of digital technologies, hence enhancing processes within organisations and providing the enabling skills to thrive in a digital economy. Supercomputing centres focused on solving industry-led problems face the challenge of having a pool of users with little experience in executing simulations on large-scale facilities, as well as limited knowledge of advanced computational techniques and integrated technologies. We aim not only at educating them in using the facilities available, but to raise awareness of methods which have the potential to increase their productivity. In this paper, we provide our perspective on how to efficiently train industry users, and how to engage with them about wider digital technologies and how these, used efficiently together, can benefit their business.
Volume 14, Issue 1 (July 2023), pp. 46–52
https://doi.org/10.22369/issn.2153-4136/14/1/7As more students want to pursue a career in big data analytics and data science, big data education has become a focal point in many colleges and universities' curricula. There are many challenges when it comes to teaching and learning big data in a classroom setting. One of the biggest challenges is to prepare big data infrastructure to provide meaningful hands-on experience to students. Setting up necessary distributed computing resource is a delicate act for instructors and system administrators because there is no one size fit all solutions. In this paper, we propose an approach that facilitates the creation of the computing environment on both personal computers and public cloud resources. This combined approach meet different needs and can be used in an educational setting to facilitate different big data learning activities. We discuss and reflect on our experience using these systems in teaching undergraduate and graduate courses.
Volume 14, Issue 1 (July 2023), pp. 53–54
https://doi.org/10.22369/issn.2153-4136/14/1/8This article gives an overview of ECP's Broadening Participation Initiative (https://www.exascaleproject.org/hpc-workforce/), which has the mission of establishing a sustainable plan to recruit and retain a diverse workforce in the DOE high-performance computing community by fostering a supportive and inclusive culture within the computing sciences at DOE national laboratories. We will describe key activities within three complementary thrusts: establishing an HPC Workforce Development and Retention Action Group, creating accessible 'Intro to HPC' training materials, and launching the Sustainable Research Pathways for High-Performance Computing (SRP-HPC) workforce development program. We are leveraging ECP's unique multilab partnership to work toward sustainable collaboration across the DOE community, with the long-term goal of changing the culture and demographic profile of DOE computing sciences.
Volume 14, Issue 1 (July 2023), pp. 55–59
https://doi.org/10.22369/issn.2153-4136/14/1/9SARS-CoV-2 (also known as COVID-19) is a coronavirus that has recently emerged and impacted nearly every human on the planet. The nonstructural protein 12 (NSP 12) is an RNA-dependent RNA polymerase that replicates viral RNA in a cell to infect it. Interrupting this function should prohibit the virus from replicating within the body and would decrease the severity of the virus's effects in patients. The objective of this project is to identify potential inhibitors for NSP 12 that might be suitable as antiviral drugs. Thus, we obtained the structure of NSP 12 from RCSB's protein data bank. The protein structure was analyzed using computer software (Chimera and PyRx), and ligands obtained from the ZINC database and RCSB's protein data bank were docked to NSP 12. The resulting binding affinities were recorded, and binding geometries analyzed.
Volume 13, Issue 2 (December 2022), pp. 1–1
A brief introduction to this issue of the Journal of Computational Science Education from the editor.
Volume 13, Issue 2 (December 2022), pp. 2–7
https://doi.org/10.22369/issn.2153-4136/13/2/1In order to fulfill the needs of an evolving job market, formal academic programs are continuously expanding computational training in traditional discipline-specific courses. We developed an informal, twelve contact-hour course tailored for economics students entering a computationally rigorous graduate-level course to help mitigate disparities in computing knowledge between students and prepare them for more advanced instruction within the formal setting. The course was developed to teach the R programming language to students without assuming any prior knowledge or experience in programming or the R environment. In order to allow for ease of implementation across various training approaches, the course was modularized with each section containing distinct topics and learning objectives. These modules can be easily developed as independent lessons so that discipline-specific needs can be addressed through inclusion or exclusion of certain topics. This implementation used the R package 'learnr' to develop the course, which rendered a highly extensible and portable interactive Shiny document that can be deployed on any system on which RStudio is installed. The course is offered as a series of interactive sessions during which students are led through the Shiny notebook by an instructor. Owing to its structure, it can be offered as an asynchronous web-based set of tutorials as well.
Volume 13, Issue 2 (December 2022), pp. 8–11
https://doi.org/10.22369/issn.2153-4136/13/2/2Many Research-1 (R1) universities create investments in High Performance Computing (HPC) centers to facilitate grant-funded computing projects, leading to student training and outreach on campus. However, creating an HPC workforce pipeline for undergraduates at non-research-intensive universities requires creative, zero-cost education and exposure to HPC. We describe our approach to providing HPC education and opportunities for students at California State University Channel Islands, a four-year university / Hispanic-Serving Institution (HSI) with a primarily first-generation-to-college student population. We describe how we educate our university population in HPC without a dedicated HPC training budget. We achieve this by (1) integrating HPC topics and projects into non-HPC coursework, (2) organizing a campus-wide data analysis and visualization student competition with corporate sponsorship, (3) fielding undergraduate teams in an external, equity-focused supercomputing competition, (4) welcoming undergraduates into faculty HPC research, and (5) integrating research data management principles and practices into coursework. The net effect of this multifaceted approach is that our graduates are equipped with core competencies in HPC and are excited about entering HPC careers.
Volume 13, Issue 2 (December 2022), pp. 12–16
https://doi.org/10.22369/issn.2153-4136/13/2/3The Blue Waters Fellowship program supported by the National Science Foundation focused on supporting PhD candidates requiring access to high performance computing resources to advance their computational and data-enabled research. The program was designed to strengthen the workforce engaged in computational research. As the program developed, a number of modifications were made to improve the experience of the fellows and promote their success. We review the program, its evolution, and the impacts it had on the participants. We then discuss how the lessons learned from those efforts can be applied to future educational efforts.
Volume 13, Issue 2 (December 2022), pp. 17–20
https://doi.org/10.22369/issn.2153-4136/13/2/4NSF-supported cyberinfrastructure (CI) has been highly successful in advancing science and engineering over the last few decades. During that time, there have been significant changes in the size and composition of the participating community, the architecture and capacity of compute, storage, and networking platforms, and the methods by which researchers and CI professionals communicate. These changes require rethinking the role of research support services and how they are delivered. To address these changes and support an expanding community, MATCH is implementing a model for research support services in ACCESS that comprises three major themes: 1) leverage modern information delivery systems and simplify user interfaces to provide cost-effective, scalable support to a broader community of researchers, 2) engage experts from the community to develop training materials and instructions that can dramatically reduce the learning curve, and 3) employ a matchmaking service that will maintain a database of specialist mentors and student mentees that can be matched with projects to provide the domain-specific expertise needed to leverage ACCESS resources. A new ACCESS Support Portal (ASP) will serve as the single front door for researchers to obtain guided support and assistance. The ASP will leverage emerging, curated tag taxonomies to identify and match inquiries with knowledge base content and expertise. Expert-monitored question and answer platforms will be created to ensure researcher questions are accurately answered and addressed in a timely fashion, and easy-to-use interfaces such as Open OnDemand and Pegasus will be enhanced to simplify CI use and provide context-aware directed help. The result will be a multi-level support infrastructure capable of scaling to serve a growing research community with increasingly specialized support needs, resulting in research discoveries previously hindered by researchers' inability to effectively utilize NSF CI resources. This paper will cover the components of the MATCH project and discuss how MATCH will engage and work with the ACCESS community.
Volume 13, Issue 2 (December 2022), pp. 21–30
https://doi.org/10.22369/issn.2153-4136/13/2/5The Blue Waters proposal to NSF, entitled "Leadership-Class Scientific and Engineering Computing: Breaking Through the Limits," identified education and training as essential components for the computational and data analysis research and education communities. The Blue Waters project began in 2007, the petascale computing system began operations on March 28, 2013, and the system served the community longer than originally planned as it was decommissioned in January 2022. This paper contributes to the Blue Waters project's commitment to document the lessons learned and longitudinal impact of its activities. The Blue Waters project pursued a broad range of workforce development activities to recruit, engage, and support a diverse mix of students, educators, researchers, and developers across the U.S. The focus was on preparing the current and future workforce to contribute to advancing scholarship and discovery using computational and data analytics resources and services. Formative and summative evaluations were conducted to improve the activities and track the impact. Many of the lessons learned have been implemented by the National Center for Supercomputing Applications (NCSA) and the New Frontiers Initiative (NFI) at the University of Illinois, and by other organizations. We are committed to sharing our experiences with other organizations that are working to reproduce, scale up, and/or sustain activities to prepare the computational and data analysis workforce.
Volume 13, Issue 2 (December 2022), pp. 31–38
https://doi.org/10.22369/issn.2153-4136/13/2/6Recent HPC education efforts have focused on maximizing the usage of traditional- and cloud-based computing infrastructures that primarily support CPU or GPU hardware. However, recent innovations in CPU architectures from Arm and RISC-V and the acquisition of Field-Programmable Gate Array (FPGA) companies by vendors like Intel and AMD mean that traditional HPC clusters are rapidly becoming more heterogeneous. This work investigates one such example deployed at Georgia Tech – a joint workflow for processor design and reconfigurable computing courses supported by both the HPC-focused Partnership for an Advanced Computing Environment (PACE) and GT's novel architecture center, CRNCH. This collaborative workflow of HPC nodes and 40 remotely accessible Pynq devices supported over 100 students in Spring 2022, and its deployment provides key lessons on sticking points and opportunities for combined HPC and novel architecture workflows.
Volume 13, Issue 1 (April 2022), pp. 1–1
A brief introduction to this issue of the Journal of Computational Science Education from the editor.
Volume 13, Issue 1 (April 2022), pp. 2–16
https://doi.org/10.22369/issn.2153-4136/13/1/1The computer science research workforce is characterized by a lack of demographic diversity. To address this, we designed and evaluated an end-to-end mentored undergraduate research intervention to nurture diverse cohorts' skills for research and develop their vision of themselves as scientists. We hypothesized that this intervention would (a) grow scientific skills, (b) increase science identity, and (c) stimulate students to view scientific careers in computer science as future viable options. The evaluation of the hypotheses addressed the limitations in self-evaluation with a multicomponent evaluation framework, comprising five forms of evidence from faculty and students, engaging on team projects, with cohorts additionally participating in professional development programming. Results indicated that students gained in scientific skills and broadened their identity as scientists and, to some degree, strengthened their outlook on research careers. The introduced structured intervention and evaluation framework were part of a US National Science Foundation Research Experiences for Undergraduates (REU) computing-focused summer program at Rochester Institute of Technology and are applicable in other scientific disciplines and institutional settings.
Volume 13, Issue 1 (April 2022), pp. 17–20
https://doi.org/10.22369/issn.2153-4136/13/1/2Heatmaps are used to visualize data to enable people to quickly understand them. While there are libraries that enable programmers to create heatmaps with their data, scientists who do not typically write programs need a way to quickly create heatmaps to understand their data and use those figures in their publications. One of the authors is not a programmer but needed a way to generate heatmaps for their research. For a summer undergraduate research experience, we created a program with a graphical user interface to allow non-programmers, including that author, to create heatmaps to visualize their data with just a few mouse clicks. The program allows the user to easily customize their heatmaps and export them as PNG or PDF files to use in their publications.
Volume 13, Issue 1 (April 2022), pp. 21–22
https://doi.org/10.22369/issn.2153-4136/13/1/3The potential HPC community grows ever wider as methodologies such as AI and big data analytics push the computational needs of more and more researchers into the HPC space. As a result, requirements for training are exploding as HPC adoption continues to gather pace. However, the number of topics that can be thoroughly addressed without providing access to actual HPC resources is very limited, even at the introductory level. In cases where access to production HPC resources is available, security concerns and the typical overhead of arranging for account provision and training reservations make the scalability of this approach challenging.
Volume 13, Issue 1 (April 2022), pp. 23–26
https://doi.org/10.22369/issn.2153-4136/13/1/4The National Energy Research Supercomputing Center (NERSC) at Lawrence Berkeley National Laboratory (LBNL) organizes approximately 20 training events per year for its 8,000 users from 800 projects, who have varying levels of High Performance Computing (HPC) knowledge and familiarity with NERSC's HPC resources. Due to the novel circumstances of the pandemic, NERSC began transforming our traditional smaller-scale, on-site training events to larger-scale, fully virtual sessions in March 2020. We treated this as an opportunity to try new approaches and improve our training best practices. This paper describes the key practices we have developed since the start of this transformation, including considerations for organizing events; collaboration with other HPC centers and the DOE ECP Program to increase reach and impact of events; targeted emails to users to increase attendance; efficient management of user accounts for computational resource access; strategies for preventing Zoombombing; streamlining the publication of professional-quality, closed-captioned videos on the NERSC YouTube channel for accessibility; effective communication channels for Q&A; tailoring training contents to NERSC user needs via close collaboration with vendors and presenters; standardized training procedures and publishing of training materials; and considerations for planning HPC training topics. Most of these practices will be continued after the pandemic as effective norms for training.
Volume 13, Issue 1 (April 2022), pp. 27–31
https://doi.org/10.22369/issn.2153-4136/13/1/5Under-representation of minorities and women in the STEM workforce, especially in computing, is a contributing factor to the Computational and Data Science (CDS) workforce shortage. In 2019, 12 percent of the workforce was African American, while only 7 percent of STEM workers were African American with a bachelor's degree or higher. Hispanic share of the workforce increased to 18 percent by 2019; Hispanics with a bachelor's degree or higher are only 8 percent of the STEM workforce [1]. Although some strides have been made in integrating CDS competencies into the university curriculum, the pace of change has been slow resulting in a critical shortage of sufficiently qualified students at both the baccalaureate and graduate levels. The NSF Working Group on Realizing the Potential of Data Science final report recommends "strengthening curriculum at EPSCoR and Minority Serving Institutions (MSI) so students are prepared and competitive for employment opportunities in industry and academia" [2]. However, the resource constraints and large teaching loads can impede the ability of MSIs and smaller institutions to quickly respond and make the necessary curriculum changes. Ohio Supercomputer Center (OSC) in collaboration with Bethune Cookman University (B-CU), Clark Atlanta University (CAU), Morgan State University (Morgan), Southeastern Universities Research Association (SURA), Southern University and A&M College (SUBR), and the University of Puerto Rico at Mayagüez (UPRM) are piloting a Computational and Data Science Curriculum Exchange (C2Exchange) to address the challenges associated with sustained access to computational and data science courses in institutions with high percentage enrollment of students from populations currently under-represented in STEM disciplines. The goal of the C2Exchange pilot is to create a network for resource constrained institutions to share CDS courses and increase their capacity to offer CDS minors and certificate programs. Over the past three years we have found that the exchange model facilitates the sharing of curriculum and expertise across institutions for immediate implementation of some courses and long-term capacity building for new Computational and Data Science programs and minors.
Volume 13, Issue 1 (April 2022), pp. 32–37
https://doi.org/10.22369/issn.2153-4136/13/1/6Responding to the growing need for discipline-specific computing curricula in academic programs, we offer a template to help bridge the gap between informal and formal curricular support. Here, we report on a twenty-contact-hour computing course developed for economics majors at Texas A&M University. The course is built around thematic laboratories that each include learning objectives, learning outcomes, assignments, and assessments and is geared toward students with a high-school level knowledge of mathematics and statistics. Offered in an informal format, the course leverages the wide applicability of the Python programming language and scaffolding offered by discipline-specific, hands-on activities to introduce a curriculum that covers introductory topics in programming while prioritizing approaches that are more relevant to the discipline. The design leverages technology to offer classes in an interactive, Web-based format for both in-person and remote learners, ensuring easy access and scalability to other institutions as needed. To ensure easier adoption among faculty and offer differentiated learning opportunities for students, lectures are modularized to 10-minute segments that are mapped to other concepts covered during the entire course. Class notes, lectures, and exercises are pre-staged and leverage aspects of flipped classroom methods. The course concludes with a group project and follow-on engagements with instructors. In future iterations, curriculum can be extended with a capstone in a Web-based asynchronous certification process.
Volume 13, Issue 1 (April 2022), pp. 38–43
https://doi.org/10.22369/issn.2153-4136/13/1/7Given the pivotal role of data and cyberinfrastructure (CI) in teaching and scientific discovery, it is essential that researchers at small and mid-sized institutions be empowered to fully exploit them. While access to physical infrastructure is essential, it is equally important to have access to people known as Research Computing Facilitators (RCFs) who possess a mix of technical knowledge and interpersonal skills that enables faculty to make the best use of available computing resources. Meeting this need is a significant challenge for small and mid-sized institutions that do not have the critical mass to build teams of RCFs on site. Launched in 2017, the National Science Foundation (NSF) funded Northeast Cyberteam (NECT) built a program to address these challenges for researchers/educators at small and mid-sized institutions in four states — Maine, Massachusetts, New Hampshire, and Vermont — while simultaneously developing self-service tools that support management and execution of RCF engagements. These tools are housed in a Portal called Connect.cyberinfrastructure and have enabled adoption of program methods by the broader research computing community. Initiated in 2020, the NSF-funded Cyberteam to Advance Research and Education in Eastern Regional Schools (CAREERS) has leveraged the NECT methods and tools to jumpstart a program that supports researchers at small and mid-sized institutions in six states and lays the groundwork for an additional level of support via a distributed network of experts directly accessible by the researchers in the region. This paper discusses findings from the first four years of NECT and the first year of CAREERS.
Volume 13, Issue 1 (April 2022), pp. 44–49
https://doi.org/10.22369/issn.2153-4136/13/1/8While artificial intelligence and machine learning (AI/ML) frameworks gain prominence in science and engineering, most researchers face significant challenges in adopting complex AI/ML workflows to campus and national cyberinfrastructure (CI) environments. Data from the Texas A&M High Performance Computing (HPRC) researcher training program indicate that researchers increasingly want to learn how to migrate and work with their pre-existing AI/ML frameworks on large scale computing environments. Building on the continuing success of our work in developing innovative pedagogical approaches for CI-training approaches, we expand CI-infused pedagogical approaches to teach technology-based AI and data sciences. We revisit the pedagogical approaches used in the decades-old tradition of laboratories in the Physical Sciences that taught concepts via experiential learning. Here, we structure a series of exercises on interactive computing environments that give researchers immediate hands-on experience in AI/ML and data science technologies that they will use as they work on larger CI resources. These exercises, called "tech-labs," assume that participating researchers are familiar with AI/ML approaches and focus on hands-on exercises that teach researchers how to use these approaches on large-scale CI. The tech-labs offer four consecutive sessions, each introducing a learner to specific technologies offered in CI environments for AI/ML and data workflows. We report on our tech-lab offered for Python-based AI/ML approaches during which learners are introduced to Jupyter Notebooks followed by exercises using Pandas, Matplotlib, Scikit-learn, and Keras. The program includes a series of enhancements such as container support and easy launch of virtual environments in our Web-based computing interface. The approach is scalable to programs using a command line interface (CLI) as well. In all, the program offers a shift in focus from teaching AI/ML toward increasing adoption of AI/ML in large-scale CI.
Volume 13, Issue 1 (April 2022), pp. 50–54
https://doi.org/10.22369/issn.2153-4136/13/1/9Successful outreach to computational researchers for informing about the benefits of switching to a different computing environment depends on the educator's ability to showcase practical research and development workflows in the new computing environment. Interactive, graphical computing environments are crucial to engage learners in computing education and offer researchers easier ways to adopt new technologies. Interactive, graphical computing allows learners to see the results of their work in real time, which provides the needed feedback for learning and enables chunking of complex tasks. Moreover, there is a natural synergy between computing education and computing research; researchers who are exposed to new computing skills within the context of an interactive and engaging environment are more likely to retain the new skills and adopt the new computing environment in their research and development workflows. Support for interactive, graphical workflows with modern computing tools in containerized computing environments has to be incorporated on high performance computing systems. To begin to address this deficiency, here we discuss our approach to teach containerization technologies in the popular integrated development environment of the Jupyter Notebook. We report on our scheme for implementing containerized software environments for interactive, graphical computing within the Open OnDemand (OOD) framework for research computing workflows, providing an accessible on-ramp for researchers transitioning to containerized technologies. In addition, we introduce several quality-of-life improvements for researchers and educators that will encourage them to continue to use the platform.
Volume 12, Issue 3 (December 2021), pp. 1–1
A brief introduction to this issue of the Journal of Computational Science Education from the editor.
Volume 12, Issue 3 (December 2021), pp. 2–12
https://doi.org/10.22369/issn.2153-4136/12/3/1Wildfire simulations are developed for interactive use in online geography classes under the course titled Disasters. Development of local capability to design and offer computational activities in courses at a small, rural college is a long-term activity based on integrated scientific research and education efforts.
Volume 12, Issue 3 (December 2021), pp. 13–26
https://doi.org/10.22369/issn.2153-4136/12/3/2For a long time, high-performance computers and simulations were of interest only at universities and research institutes. In recent years, however, their application and relevance in a wider field has grown; not only do industry and small and medium-sized businesses benefit from these technologies, but their social and political impacts are also increasing significantly. Therefore, there is an increasing need for experts in this field as well as better understanding of the importance of high-performance computing (HPC) and simulations among the general public. For this reason, the German National Supercomputing Center HLRS has broadened its academic training program to include courses for students and teachers as well as for professionals. Specifically, this expansion involves two projects: "Simulated Worlds," which offers a variety of educational programs for middle and high school students, and the "MoeWE" project with its "Supercomputing Academy" for professionals. These projects complement the center's academic educational focus by addressing the special needs of these new target groups who have otherwise not been able to benefit from HLRS' academic training program. In this paper, we present background concepts, programmatic offerings, and exemplary content of the two projects; discuss the experiences involved in their development and implementation; and provide insights that may be useful for improving education and training in this area.
Volume 12, Issue 3 (December 2021), pp. 27–34
https://doi.org/10.22369/issn.2153-4136/12/3/3The growing need for a workforce that can analyze, model, and interpret real-world data strongly points to the importance of imparting fundamental concepts of computational and data science to the current student generation regardless of their intended majors. This paper describes the experiences in developing and implementing a course in computation, modeling, and simulation. The main goal of the course was to infuse fundamental competencies of computational science to the undergraduate curriculum. The course also aimed at making students aware that modeling and simulation have become an essential part of the research and development process in the sciences, social sciences, and engineering. The course was targeted to students of all majors.
Volume 12, Issue 2 (February 2021), pp. 1–2
A brief introduction to this issue of the Journal of Computational Science Education from the editor.
Volume 12, Issue 2 (February 2021), pp. 3–10
https://doi.org/10.22369/issn.2153-4136/12/2/1DeapSECURE is a non-degree computational training program that provides a solid high-performance computing (HPC) and big-data foundation for cybersecurity students. DeapSECURE consists of six modules covering a broad spectrum of topics such as HPC platforms, big-data analytics, machine learning, privacy-preserving methods, and parallel programming. In the second year of this program, to improve the learning experience, we implemented a number of changes, such as grouping modules into two broad categories, "big-data" and "HPC"; creating a single cybersecurity storyline across the modules; and introducing post-workshop (optional) "hackshops." Two major goals of these changes are, firstly, to effectively engage students to maintain high interest and attendance in such a non-degree program, and, secondly, to increase knowledge and skill acquisition. To assess the program, and in particular the changes made in the second year, we evaluated and compared the execution and outcomes of the training in Year 1 and Year 2. The assessment data shows that the implemented changes have partially achieved our goals, while simultaneously providing indications where we can further improve. The development of a fully on-line training mode is planned for the next year, along with a reproducibility pilot study to broaden the subject domain from cybersecurity to other areas, such as computations with sensitive data.
Volume 12, Issue 2 (February 2021), pp. 11–17
https://doi.org/10.22369/issn.2153-4136/12/2/2The COVID-19 national health crisis forced a sudden and drastic move to online delivery of instruction across the nation. This almost instantaneous transition from a predominantly traditional "in-person" instruction model to a predominantly online model has forced programs to rethink instructional approaches. Before COVID-19 and mandatory social distancing, online training in research computing (RC) was typically limited to "live-streaming" informal in-person training sessions. These sessions were augmented with hands-on exercises on live notebooks for remote participants, with almost no assessment of student learning. Unlike select instances that focused on an international audience, local training curricula were designed with the in-person attendee in mind. Sustained training for RC became more important since when several other avenues of research were diminished. Here we report on two educational approaches that were implemented in the informal program hosted by Texas A&M High Performance Research Computing (HPRC) in the Spring, Summer, and Fall semesters of 2020. These sessions were offered over Zoom, with the instructor assisted by moderators using the chat features. The first approach duplicated our traditional in-person sessions in an online setting. These sessions were taught by staff, and the focus was on offering a lot of information. A second approach focused on engaging learners via shorter pop-up courses in which participants chose the topic matter. This approach implemented a peer-learning environment, in which students taught and moderated the training sessions. These sessions were supplemented with YouTube videos and continued engagement over a community Slack workspace. An analysis of these approaches is presented.
Volume 12, Issue 2 (February 2021), pp. 18–20
https://doi.org/10.22369/issn.2153-4136/12/2/3Interaction is the key to making education more engaging. Effective interaction is difficult enough to achieve in a live classroom, and it is extremely challenging in a virtual environment. To keep the degree of instruction and learning at the levels our students have come to expect, additional efforts are required to focus efforts on other facets to motivate learning, whether the learning is relative to students in our academic courses, student internship programs, Summer Institute Series, or NSF/TACC's Frontera Fellowship Program. We focus our efforts in lecturing less and interacting more.
Volume 12, Issue 2 (February 2021), pp. 21–21
https://doi.org/10.22369/issn.2153-4136/12/2/4The call for accelerated computing and data science skills is soaring, and classrooms are on the front lines of feeding the demand. The NVIDIA Deep Learning Institute (DLI) offers handson training in AI, accelerated computing, and accelerated data science. Developers, data scientists, educators, researchers, and students can get practical experience powered by GPUs in the cloud. DLI Teaching Kits are complete course solutions that lower the barrier of incorporating AI and GPU computing in the classroom. The DLI University Ambassador Program enables qualified educators to teach DLI workshops, at no cost, across campuses and academic conferences to faculty, students, and researchers. DLI workshops offer student certification that demonstrates subject matter competency and supports career growth. Join NVIDIA's higher education leadership and leading adopters from academia to learn how to get involved in these programs.
Volume 12, Issue 2 (February 2021), pp. 22–24
https://doi.org/10.22369/issn.2153-4136/12/2/5To address the need for a diverse and capable workforce in advanced digital services and resources, the Shodor Education Foundation has been coordinating an undergraduate student program for the Extreme Science and Engineering Discovery Environment (XSEDE). The name of the program is EMPOWER (Expert Mentoring Producing Opportunities for Work, Education, and Research). The goal of the program is to engage a diverse group of undergraduate students in the work of XSEDE, matching them with faculty and staff mentors who have projects that make use of XSEDE services and resources or that otherwise prepare students to use these types of services and resources. Mentors have coordinated projects in computational science and engineering research in many fields of study as well as systems and user support. Students work for a semester, quarter, or summer at a time and can participate for up to a year supported by stipends from the program, at different levels depending on experience. The program has run for 11 iterations from summer 2017 through fall 2020. The 111 total student participants have been 28% female and 31% underrepresented minority, and they have been selected from a pool of 272 total student applicants who have been 31% female and 30% underrepresented minority. We are pleased that the selection process does not favor against women and minorities but would also like to see these proportions increase. At least one fourth of the students have presented their work in articles or at conferences, and multiple credit the program with moving them towards graduate study or otherwise advancing them in their careers.
Volume 12, Issue 2 (February 2021), pp. 25–30
https://doi.org/10.22369/issn.2153-4136/12/2/6The Pawsey Supercomputing Centre training has evolved over the past decade, but never as rapidly as during the COVID-19 pandemic. The imperative to quickly move all training online — to reach learners facing travel restrictions and physical distancing requirements — has expedited our shift online. We had planned to increase our online offerings, but not at this pace or to this extent. In this paper, we discuss the challenges we faced in making this transition, including how to creatively motivate and engage learners, build our virtual training delivery skills, and build communities across Australia. We share our experience in using different learning methods, tools, and techniques to address specific educational and training purposes. We share trials and successes we have had along the way. Our guiding premise is that there is no universal learning solution. Instead, we purposefully select various solutions and platforms for different groups of learners.
Volume 12, Issue 2 (February 2021), pp. 31–32
https://doi.org/10.22369/issn.2153-4136/12/2/7Supercomputers are moving towards exascale computing, high-performance computer systems are becoming larger and larger, and the scale and complexity of high-performance computing (HPC) applications are also increasing rapidly, which puts forward high requirements for cultivation of HPC majors and HPC course development. HPC majors are required to be able to solve practical problems in a specific field of high-performance computing, which may be a problem for system design or a problem for a specific HPC application field. Regardless of the type of problem, the complexity and difficulty of the problem are often very high because HPC is interdisciplinary. The development of HPC courses to meet these kinds of talent cultivation needs must emphasize the cultivation of students' Generalized System-level Comprehensive Capabilities, so that students can master the key elements in the limited course knowledge learning process. System-level Comprehensive Capability refers to the ability to use the knowledge and ability of the computer system to solve practical problems. The ACM/IEEE Joint Computer Science Curricula 2013 (CS2013) also involves System-level Perspective. System-level Comprehensive Capability is considered to be a crucial factor to improve students' system development ability and professional ability. This is especially important for students majoring in high-performance computing. Furthermore, due to the HPC field's interdisciplinary and high complexity characteristics, System-level Comprehensive Capability is not enough for HPC majors, and students need to have Generalized System-level Comprehensive Capabilities. A knowledge system at the computer system level "vertically" (from bottom to top: parallel computer architecture, operating system/resource management system, compilation, library optimization, etc.) is no longer enough; multiple high-performance computing application areas should also be "horizontally" involved. Generalized System-level Comprehensive Capabilities vertically and horizontally can meet the needs of different types of high-performance computing talents.
Volume 12, Issue 2 (February 2021), pp. 33–36
https://doi.org/10.22369/issn.2153-4136/12/2/8Positions within High Performance Computing are difficult to fill, especially that of Site Reliability Engineer within an operational area. At the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory (LBNL), the Operations team manage the HPC computational facility with a complex cooling ecosystem and also serve as the wide area network operations center. Therefore, this position requires skill sets in four specific areas: system administration, storage administration, facility management, and wide area networking. These skills are not taught in their entirety in any educational program; therefore, a new graduate will require extensive training before they can become proficient in all areas. The proximity to Silicon Valley adds another challenge in finding qualified candidates. NERSC has implemented a new approach patterned after the apprenticeship program in the trades. This program requires an intern or apprentice to fulfill milestones during their internship or apprenticeship timeframe, with constant evaluation, feedback, mentorship, and hands-on work that allow candidates to demonstrate their growing skill that will eventually lead to winning a career position.
Volume 12, Issue 2 (February 2021), pp. 37–40
https://doi.org/10.22369/issn.2153-4136/12/2/9Ask.CI, the Q&A site for Research Computing, was launched at PEARC18 with the goal of aggregating answers to a broad spectrum of questions that are commonly asked by the research computing community. As researchers, facilitators, staff, students, and others ask and answer questions on Ask.CI, they create a shared knowledge base for the larger community. For smaller institutions, the knowledge base provided by Ask.CI provides a wealth of knowledge that was previously not readily available to scientists and educators in an easily searchable Q&A format. For larger institutions, this self-service model frees up time for facilitators and cyberinfrastructure engineers to focus on more advanced subject matter. Recognizing that answers evolve rapidly with new technology and discovery, Ask.CI has built in voting mechanisms that utilize crowdsourcing to ensure that information stays up to date. Establishing a Q&A site of this nature requires some tenacity. In partnership with the Campus Champions, Ask.CI has gained traction and continues to engage the broader community to establish the platform as a powerful tool for research computing. Since launch, Ask.CI has attracted over 250,000 page views (currently averaging nearly 5,000 per week), more than 400 contributors, hundreds of topics, and a broad audience that spans the US and parts of Europe and Asia. Ask.CI has shown steady growth in both contributions and audience since it was launched in 2018 and is still evolving. In the past year, we introduced Locales, which allow institutions to create subcategories on Ask.CI where they can experiment with posting institution-specific content and use of the site as a component of their user support strategy.
Volume 12, Issue 2 (February 2021), pp. 41–45
https://doi.org/10.22369/issn.2153-4136/12/2/10This paper presents a newly developed course for teaching parallel programming to undergraduates. This course uses a flipped classroom model and a "hands-on" approach to learning with multiple real-world examples from a wide range of science and engineering problems. The intention of this course is to prepare students from a variety of STEM backgrounds to be able to take on supportive roles in research labs while they are still undergraduates. To this end, students are taught common programming paradigms such as benchmarking, shared memory parallelization (OpenMP), accelerators (CUDA), and shared network parallelization (MPI). Students are also trained in practical skills including the Linux command line, workflow/file management, installing software, discovering and using shared module systems (LDMOD), and effectively submitting and monitoring jobs using a scheduler (SLURM).
Volume 12, Issue 2 (February 2021), pp. 46–57
https://doi.org/10.22369/issn.2153-4136/12/2/11A Master of Science (MSc) conversion degree is one which retrains students in a new subject area within a fast-tracked period of time. This type of programme opens new opportunities to students beyond those gained through their originally-chosen degree. Students entering a conversion degree do so, in a number of cases, to improve career options, which might mean moving from an initially-chosen path to gain skills in a field that they now consider to be more attractive. With a core goal of improving future employability prospects, specific requirements are therefore placed on the learning outcomes achieved from the course content and delivery. In this paper, the learning outcomes are focused on the transferable skills intended to be gained as a result of the assessment design, disseminated to a cohort of students on a Master of Science (MSc) degree in Professional Software Development at Ulster University, United Kingdom. The coursework submissions are explored to demonstrate how module learning has been applied, in a creative way, to facilitate the assessment requirements.
Volume 12, Issue 2 (February 2021), pp. 58–65
https://doi.org/10.22369/issn.2153-4136/12/2/12Graduates with high performance computing (HPC) skills are more in demand than ever before, most recently fueled by the rise of artificial intelligence and big data technologies. However, students often find it challenging to grasp key HPC issues such as parallel scalability. The increased demand for processing large-scale scientific computing data makes more essential the importance of mastering parallelism, with scalability often being a crucial factor. This is even more challenging when non-computing majors require HPC skills. This paper presents the design of a parallel computing course offered to atmospheric science majors. It discusses how the design addressed challenges presented by non-computer science majors who lack a background in fundamental computer architecture, systems, and algorithms. The content of the course focuses on the concepts and methods of parallelization, testing, and the analysis of scalability. Considering all students have to confront many (non-HPC) scalability issues in the real world, and there may be similarities between real-world scalability and parallel computing scalability, the course design explores this similarity in an effort to improve students' understanding of scalability issues in parallel computing. The authors present a set of assignments and projects that leverage the Tianhe-2A supercomputer, ranked #6 in the TOP500 list of supercomputers, for testing. We present pre- and post-questionnaires to explore the effectiveness of the class design and find an 11.7% improvement in correct answers and a decrease of 36.8% in obvious, but wrong, answers. The authors also find that students are in favor of this approach.
Volume 12, Issue 2 (February 2021), pp. 66–69
https://doi.org/10.22369/issn.2153-4136/12/2/13The performance of HPC applications depends on a wide range of factors, including algorithms, programming models, library and language implementations, and hardware. To make the problem even more complicated, many applications inherit different layers of legacy code, written and optimized for a different era of computing technologies. Due to this complexity, the task of understanding performance bottlenecks of HPC applications and making improvements often ends up being a daunting trial-and-error process. Problematically, this process often starts without having a quantitative understanding of the actual behavior of the HPC code. The Performance Optimisation and Productivity (POP) Centre of Excellence, funded by the EU under the Horizon 2020 Research and Innovation Programme, attempts to establish a quantitative methodology for the assessment of parallel codes. This methodology is based on a set of hierarchical metrics, where the metrics at the bottom of the hierarchy represent common causes of poor performance. These metrics provide a standard, objective way to characterize different aspects of the performance of parallel codes and therefore provide the necessary foundation for establishing a more systematic approach for performance optimization of HPC applications. In consequence, the POP methodology facilitates training new HPC performance analysts. In this paper, we will illustrate these advantages by describing two real-world examples where we used the POP methodology to help HPC users understand performance bottlenecks of their code.
Volume 12, Issue 1 (January 2021), pp. 1–1
A brief introduction to this issue of the Journal of Computational Science Education from the editor.
Volume 12, Issue 1 (January 2021), pp. 2–7
https://doi.org/10.22369/issn.2153-4136/12/1/1This article reports on the efforts of the Computer Science Education Collaborative during the period between 2018–2020 to develop and implement a new computer science licensure program for preservice teachers seeking a license to teach computer science in grades 7–12 in Vermont. We present a brief review of the literature related to computer science teacher education and describe the process of developing the computer science education minor and major concentration at the University of Vermont. As a form of reflection, we discuss the program development process and lessons learned by the collaborative that might be informative to other institutes of higher education involved in CS teacher education program design and implementation. Finally, we describe next steps for developing in-service licensure programs for teachers seeking computer science professional development or licensure in grades 7–12.
Volume 12, Issue 1 (January 2021), pp. 8–15
https://doi.org/10.22369/issn.2153-4136/12/1/2This paper provides a supervised machine learning example to identify laboratory glassware. This project was implemented in an Introduction to Scientific Computing course for first-year students at our institution. The goal of the exercise was to present a typical machine learning task in the context of a chemistry laboratory to engage students with computing and its applications to scientific projects. This is an end-to-end data science experience with students creating the dataset, training a neural network, and analyzing the performance of the trained network. The students collected pictures of various glassware in a chemistry laboratory. Four pre-trained neural networks, Inception-V1, Inception-V3, ResNet-50, and ResNet-101 were trained to distinguish between the objects in the pictures. The Wolfram Language was used to carry out the training of neural networks and testing the performance of the classifier. The students received hands-on training in the Wolfram Language and an elementary introduction to image classification tasks in the machine learning domain. Students enjoyed the introduction to machine learning applications and the hands-on experience of building and testing an image classifier to identify laboratory equipment.
Volume 12, Issue 1 (January 2021), pp. 16–23
https://doi.org/10.22369/issn.2153-4136/12/1/3Turbulent boundary layers that evolve along the flow direction are ubiquitous. Moreover, accounting for the effects of wall-curvature driven pressure gradient and flow compressibility adds significant complexity to the problem. Consequently, hypersonic spatially-developing turbulent boundary layers (SDTBL) over curved walls are of crucial importance in aerospace applications, such as unmanned high-speed vehicles, scramjets, and advanced space aircraft. More importantly, hypersonic capabilities would provide faster responsiveness and longer range coverage to U.S. Air Force systems. Thus, the acquired understanding of the physics behind high speed boundary layers over curved wall-bounded flows can lead to the development of more efficient control techniques for the fluid flow (e.g., wave drag reduction) and aerodynamic heating on hypersonic vehicle design. In this investigation, a series of numerical experiments is performed to evaluate the effects of strong concave curvature and supersonic/hypersonic speeds (Mach numbers of 2.86 and 5, respectively) on the thermal transport phenomena that take place inside the boundary layer. The flow solver to be used is based on a RANS approach. Two different turbulence models are compared: the SST (Shear Stress Transport) model by Menter and the standard k-ω model by Wilcox. Furthermore, numerical results are validated by means of experimental data from the literature (Donovan et al., J. Fluid Mech., 259, 1-24, 1994) for the moderate concave curvature case and a Mach number of 2.86. The present study allows us to initially obtain a first insight of the flow physics for a forthcoming better design of 3D meshes and computational boxes, as part of a more ambitious project that involves Direct Numerical Simulation (DNS) of curved wall-bounded flows in the supersonic/hypersonic regime. The uniqueness of this RANS analysis in concave curved walls can be summarized as follows: (i) study of the compressibility effects on the time-averaged velocity and temperature, (ii) analysis of the influence of different inflow boundary conditions.
Volume 12, Issue 1 (January 2021), pp. 24–31
https://doi.org/10.22369/issn.2153-4136/12/1/4The main objective of computer graphics is to effectively depict an image in a virtual scene in its realistic form within a reasonable amount of time. This paper discusses two different ray tracing techniques and the performance evaluation of the serial and parallel implementation of ray tracing, which in its serial form is known to be computationally intensive and costly for previous computers. The parallel implementation was achieved using OpenMP with C++, and the maximum speedup was ten times that of the serial implementation. The experiment in this paper can be used to teach high-performance computing students the benefits of multi-threading in computationally intensive algorithms and the benefits of parallel programming.
Volume 12, Issue 1 (January 2021), pp. 32–38
https://doi.org/10.22369/issn.2153-4136/12/1/5Machine learning has accounted for solving a cascade of data in an efficient and timely manner including as an alternative molecular calculator to replace more expensive ab initio techniques. Neural networks (NN) are the most predictive for new cases that are similar to examples in their training sets; however, it is sometimes necessary for the NN to accurately evaluate structures not in its training set. In this project, we quantify how clustering a training set into groups with similar geometric motifs can be used to train a NN so that it can accurately determine the energies of structures not in the training set. This was accomplished by generating over 800 C8H7N structures, relaxing them using DFTB+, and grouping them using agglomerative clustering. Some of these groups were assigned to the training group and used to train a NN using the pre-existing Atomistic Machine-learning Package (AMP). The remaining groups were evaluated using the trained NN and compared to the DFTB+ energy. These two energies were plotted and fitted to a straight line where higher R2 values correspond to the NN more accurately predicting the energies of structures not in its training set. This process was repeated systematically with a different number of nodes and hidden layers. It was found that for limited NN architectures, the NN did a poor job predicting structures outside of its training set. This was improved by adding hidden layers and nodes as well as increasing the size of the training set.
Volume 12, Issue 1 (January 2021), pp. 39–48
https://doi.org/10.22369/issn.2153-4136/12/1/6Making materials out of buckminsterfullerene is challenging, because it requires first dispersing the molecules in a solvent, and then getting the molecules to assemble in the desired arrangements. In this computational work, we focus on the dispersion challenge: How can we conveniently solubilize buckminsterfullerene? Water is a desirable solvent because of its ubiquity and biocompatibility, but its polarity makes the dispersion of nonpolar fullerenes challenging. We perform molecular dynamics simulations of fullerenes in the presence of fullerene oxides in implicit water to elucidate the role of interactions (van der Waals and Coulombic) on the self-assembly and structure of these aqueous mixtures. Seven coarse-grained fullerene models are characterized over a range of temperatures and interaction strengths using HOOMD-Blue on high performance computing clusters. We find that dispersions of fullerenes stabilized by fullerene oxides are observable in models where the net attraction among fullerenes is about 1.5 times larger than the attractions between oxide molecules. We demonstrate that simplified models are sufficient for qualitatively modeling micellization of these fullerenes and provide an efficient starting point for investigating how structural details and phase behavior depend upon the inclusion of more detailed physics.
Volume 12, Issue 1 (January 2021), pp. 49–58
https://doi.org/10.22369/issn.2153-4136/12/1/7Understanding turbulence and mixing due to the hydrodynamic instabilities plays an important role in a wide range of science and engineering applications. Numerical simulations of three dimensional turbulent mixing help us to predict the dynamics of two fluids of different densities, one over the other. The focus of this work is to optimize and improve the computational performance of the numerical simulations for the compressible turbulent mixing on Blue Waters, the petascale supercomputer at the National Center for Supercomputing Applications. In this paper, we study the effect of the programming models on time to solution. The hybrid programming model, which is a combination of parallel programming models, becomes a dominant approach. The most preferable hybrid model is the one that involves the Message Passing Interface (MPI), such as MPI + Pthreads, MPI + OpenMP, MPI + MPI-3 shared memory programming, and others with accelerator support. Among all choices, we choose the hybrid programming model that is based on MPI + OpenMP. We extend the purely MPI parallelized code with OpenMP parallelism and develop the hybrid version of the code. This new hybrid implementation of the code is set up in a way that multiple MPI processes handle the interface propagation, whereas multiple OpenMP threads handle the high order weighted essentially non-oscillatory numerical scheme.
Volume 11, Issue 2 (April 2020), pp. 1–1
A brief introduction to this issue of the Journal of Computational Science Education from the editor.
Volume 11, Issue 2 (April 2020), pp. 2–6
https://doi.org/10.22369/issn.2153-4136/11/2/1The central dogma is a key foundational concept in biochemistry. The idea that DNA mutations cause change at the protein level can be abstract for students. To provide a real-world example of the effect of mutation on protein function, a molecular visualization module was developed and incorporated into two biochemistry courses. This inquiry-based activity explored the molecular basis and cultural relevance of sickle cell anemia. Hemoglobin structural changes from the disease were examined. Participants used free tools including NCBI, RCSB PDB, LALIGN and Swiss PDB DeepView protein visualization software from EXPASY. This module was an active, engaging exercise which exposed students to protein visualization and increased cultural awareness.
Volume 11, Issue 2 (April 2020), pp. 7–11
https://doi.org/10.22369/issn.2153-4136/11/2/2A number of efforts have been made to introduce computational science in the undergraduate curriculum. We describe a survey of the undergraduate computational science programs in the U.S. The programs face several challenges including student recruitment and limited faculty participation in the programs. We describe the current state of the programs, discuss the problems they face, and discuss potential short- and long-range strategies that might address those challenges.
Volume 11, Issue 2 (April 2020), pp. 12–22
https://doi.org/10.22369/issn.2153-4136/11/2/3Medical micropumps that utilize Magnetic Shape Memory (MSM) alloys are small, powerful alternatives to conventional pumps because of their unique pumping mechanism. This mechanism—the transfer of fluid through the emulation of peristaltic contractions—is enabled by the magneto-mechanical properties of a shape memory alloy and a sealant material. Because the adhesion between the sealant and the alloy determines the performance of the pump and because the nature of this interface is not well characterized, an understanding of sealant-alloy interactions represents a fundamental component of engineering better solid state micropumps in particular, and metal-polymer interfaces in general. In this work we develop computational modeling techniques for investigating how the properties of sealant materials determine their adhesive properties with alloys. Specifically, we develop a molecular model of the sealant material polydimethylsiloxane (PDMS) and characterize its behavior with a model Ni-Mn-Ga surface. We perform equilibrium molecular dynamics simulations of the PDMS/Ni-Mn-Ga interface to iteratively improve the reliability, numerical stability, and accuracy of our models and the associated data workflow. To this end, we develop the first model for simulating PDMS/Ni-Mn-Ga interfaces by combining the Optimized Potentials for Liquid Simulations (OPLS) [21] force field with the Universal Force Field [5], and show promise for informing the design of more reliable MSM micropumps. We also reflect on the experiences of Blue Waters Supercomputing intern Guevara (the first author) to identify key learning moments during the one-year internship that can help guide future molecular simulation training efforts.
Volume 11, Issue 2 (April 2020), pp. 23–28
https://doi.org/10.22369/issn.2153-4136/11/2/4Severe weather outbreaks come with many different hazards. One of the most commonly known and identifiable outbreaks are those with tornadoes involved. There has been some prior research on these events with respect to lead time, but shifts in model uncertainty by lead time has yet to be quantified formally. As such, in this study we assess tornado outbreak model uncertainty by lead time by assessing ensemble model precision for outbreak forecasts. This assessment was completed by first identifying five major tornado outbreak events and simulating the events using the Weather Research and Forecasting (WRF) model at 24, 48, 72, 96, and 120-hours lead time. A 10-member stochastically perturbed initial condition ensemble was generated for each lead time to quantify uncertainty associated with initialization errors at the varied lead times. Severe weather diagnostic variables derived from ensemble output were used to quantify ensemble uncertainty by lead time. After comparing moment statistics of several convective indices, the Energy Helicity Index (EHI), Significant Tornado Parameter (STP), and Supercell Composite Parameter (SCP) did the best job of characterizing the tornadic outbreaks at all lead times. There was good consistency between each case utilizing these three indices at all five lead times, suggesting outbreak model forecasting confidence may be able to extend up to 5 days for major outbreak events. These results will be useful for operational use by forecasters in forecast ability of tornadic events.
Volume 11, Issue 2 (April 2020), pp. 29–35
https://doi.org/10.22369/issn.2153-4136/11/2/5The Evolutionary algorithm (EA), on the Atomic Simulation Environment (ASE), provides a means to find the lowest energy conformation molecule of a given stoichiometry. In this study we examine the ways in which the initial population of molecules affect the success of the EA. We have added a set of rules to the way in which the molecules are created that leads to more chemically relevant structures using chemical intuition. We have also implemented a clustering program that selects molecules that differ from each other from a large pool of molecules to form the initial population. Through testing of EA runs with and without clustering and intuitive population creation, the following success rates were obtained; no intuition and no clustering, 28±3%, no intuition with clustering, 31±4%, with fixed intuition but without clustering, 49±5%, with fixed intuition and clustering, 49±4%, with variable intuition and without clustering, 47±4%, and with variable intuition and clustering, 50±3%. A significant increase in success rate was found when implementing intuitive population creation while clustering the initial population seems to marginally help as the population becomes more diverse.
Volume 11, Issue 1 (January 2020), pp. 1–2
A brief introduction to this issue of the Journal of Computational Science Education from the guest editor.
Volume 11, Issue 1 (January 2020), pp. 3–7
https://doi.org/10.22369/issn.2153-4136/11/1/1From 2013 to 2018 the University of Virginia operated a summer school and internship program in partnership with NASA. The goal was to improve the software skills of students in environmental and earth sciences and to introduce them to high-performance computing. In this paper, we describe the program and discuss its evolution in response to student needs and changes in the high-performance computing landscape. The future direction for the summer school and plans for the materials developed are also discussed.
Volume 11, Issue 1 (January 2020), pp. 8–11
https://doi.org/10.22369/issn.2153-4136/11/1/2Cyberinfrastructure is as important for research in the 21st century as test tubes and microscopes were in the 20th century. Familiarity with and effective use of cyberinfrastructure at small and mid-sized institutions is essential if their faculty and students are to remain competitive. The Northeast Cyberteam Program is a 3-year NSF-funded regional initiative to increase effective use of cyberinfrastructure by researchers and educators at small and mid-sized institutions in northern New England by making it easier to obtain support from Research Computing Facilitators. Research Computing Facilitators combine technical knowledge and strong interpersonal skills with a service mindset, and use their connections with cyberinfrastructure providers to ensure that researchers and educators have access to the best available resources. It is widely recognized that Research Computing Facilitators are critical to successful utilization of cyberinfrastructure, but in very short supply. The Northeast Cyberteam aims to build a pool of Research Computing Facilitators in the region and a process to share them across institutional boundaries. Concurrently, we are providing experiential learning opportunities for students interested in becoming Research Computing Facilitators, and developing a self-service learning toolkit to provide timely access to information when it is needed.
Volume 11, Issue 1 (January 2020), pp. 12–20
https://doi.org/10.22369/issn.2153-4136/11/1/3Summer computing camps for high school students are rapidly becoming a staple at High Performance Computing (HPC) centers and Computer Science departments around the country. Developing complexity in education in these camps remains a challenge. Here, we present a report about the implementation of such a program. The Summer Computing Academy (SCA) at is a weeklong cybertraining program offered to high school students by High Performance Research Computing (HPRC) at Texas A&M University (Texas A&M; TAMU). The Summer Computing Academy effectively uses cloud computing paradigms, artificial intelligence technologies coupled with Raspberry Pi micro-controllers and sensors to demonstrate "computational thinking". The program is steeped in well- reviewed pedagogy; the refinement of the educational methods based on constant assessment is a critical factor that has contributed to its success. The hands-on exercises included in the program have received rave reviews from parents and students alike. The camp program is financially self-sufficient and has successfully broadened participation of underrepresented groups in computing by including diverse groups of students. Modules from the SCA program may be implemented at other institutions with relative ease and promote cybertraining efforts nationwide.
Volume 11, Issue 1 (January 2020), pp. 21–25
https://doi.org/10.22369/issn.2153-4136/11/1/4Adoption of HPC as a research tool and industrial resource is a priority in many countries. The use of data analytics and machine learning approaches in many areas also attracts non-traditional HPC user communities to the hardware capabilities provided by supercomputing facilities. As a result, HPC at all scales is experiencing rapid growth of the demand for training, with much of this at the introductory level. To address the growth in demand, we need both a scalable and sustainable training model as well as a method to ensure the consistency of the training being offered. Adopting the successful training model of The Carpentries (https://carpentries.org/) for the HPC space provides a pathway to collaboratively created training content which can be delivered in a scalable way (serving everything from university or industrial HPC systems to national facilities). We describe the ongoing efforts of HPC Carpentry to create training material to address this need and form the collaborative network required to sustain it. We outline the history of the effort and the practices adopted from The Carpentries that enable it. The lessons being created as a result are under active development and being evaluated in practice at sites in Europe, the US and Canada.
Volume 11, Issue 1 (January 2020), pp. 26–28
https://doi.org/10.22369/issn.2153-4136/11/1/5There are numerous reports documenting the critical need for high performance computing infrastructure to advance discovery in all fields of study. The Blue Waters project was funded by the National Science Foundation to address this need and provide leading edge petascale computing resources to advance research and scholarship. There are also numerous reports that identify the lack of an adequate workforce capable of utilizing and advancing petascale class computing infrastructure well into the future. From the outset, the Blue Waters project has responded to this critical need by conducting national scale workforce development activities to prepare a larger and more diverse workforce. This paper describes those activities as exemplars for adoption and replication by the community.
Volume 11, Issue 1 (January 2020), pp. 29–35
https://doi.org/10.22369/issn.2153-4136/11/1/6The ever-changing nature of HPC has always compelled the HPC community to focus a lot of effort into training of new and existing practitioners. Historically, these efforts were tailored around a typical group of users possessing, due to their background, a certain set of programming skills. However, as HPC has become more diverse in terms of hardware, software and the user background, the traditional training approaches became insufficient in addressing training needs of our community. This increasingly complicated HPC landscape makes development and delivery of new training materials challenging. How should we develop training for users, often coming from non-traditionally HPC disciplines, and only interested in learning a particular set of skills? How can we satisfy their training needs if we don't really understand what these are? It's clear that HPC centres struggle to identify and overcome the gaps in users' knowledge, while users struggle to identify skills required to perform their tasks. With the HPC Certification Forum, we aim to clearly categorise, define, and examine competencies expected from proficient HPC practitioners. In this article, we report the status and progress this independent body has made during the first year of its existence. The drafted processes and prototypes are expected to mature into a holistic ecosystem beneficial for all stakeholders in HPC education.
Volume 11, Issue 1 (January 2020), pp. 36–44
https://doi.org/10.22369/issn.2153-4136/11/1/7This paper describes a hands-on project-based Research Experiences for Computational Science, Engineering, and Mathematics (RECSEM) program in high-performance data sciences, data analytics, and machine learning on emerging computer architectures. RECSEM is a Research Experiences for Undergraduates (REU) site program supported by the USA National Science Foundation. This site program at the University of Tennessee (UTK) directs a group of ten undergraduate students to explore, as well as contribute to the emergent interdisciplinary computational science models and state-of-the-art HPC techniques via a number of cohesive compute and data intensive applications in which numerical linear algebra is the fundamental building block. The RECSEM program complements the growing importance of computational sciences in many advanced degree programs and provides scientific understanding and discovery to undergraduates with an intellectual focus on research projects using HPC and aims to deliver a real-world research experience to the students by partnering with teams of scientists who are in the forefront of scientific computing research at the Innovative Computing Laboratory (ICL), and the Joint Institute for Computational Sciences (JICS) at UTK and Oak Ridge National Laboratory (ORNL). The program also receives collaborative support from universities in Hong Kong and Changsha, China. The program focuses on scientific domains in engineering applications, image processing, machine learning, and numerical parallel solvers on supercomputers and emergent accelerator platforms, particularly their implementation on GPUs. The programs also enjoy close affiliations with researchers at ORNL. Because of these diverse topics of research areas and backgrounds of this project, in this paper we discuss the experiences and resolutions in managing and coordinating the program, delivering cohesive tutorial materials, directing mentorship of individual projects, lessons learned, and improvement over the course of the program, particularly from the perspectives of the mentors.
Volume 11, Issue 1 (January 2020), pp. 45–52
https://doi.org/10.22369/issn.2153-4136/11/1/8High-performance computing (HPC), and parallel and distributed computing (PDC) are widely discussed topics in computer science (CS) and computer engineering (CE) education. In the past decade, high-performance computing has also contributed significantly to addressing complex problems in bio-engineering, healthcare and systems biology. Therefore, computational biology applications provide several compelling examples that can be potent pedagogical tools in teaching high-performance computing. In this paper, we introduce a novel course curriculum to teach high- performance, parallel and distributed computing to senior graduate students (PhD) in a hands-on setup through examples drawn from a wealth of areas in computational biology. We introduce the concepts of parallel programming, algorithms and architectures and implementations via carefully chosen examples from computational biology. We believe that this course curriculum will provide students an engaging and refreshing introduction to this well-established domain.
Volume 11, Issue 1 (January 2020), pp. 53–60
https://doi.org/10.22369/issn.2153-4136/11/1/9OpenMP is one of the most popular programming models to exploit node-level parallelism of supercomputers. Many researchers are interested in developing OpenMP compilers or extending existing standard for new capabilities. However, there is a lack of training resources for researchers who are involved in the compiler and language development around OpenMP, making learning curve in this area steep. In this paper, we introduce an ongoing effort, FreeCompilerCamp.org, a free and open online learning platform aimed to train researchers to quickly develop OpenMP compilers. The platform is built on top of Play-With-Docker, a docker playground for users to conduct experiments in an online terminal sandbox. It provides a live training website that is set up on cloud, so anyone with internet access and a web browser will be able to take the training. It also enables developers with relevant skills to contribute new tutorials. The entire training system is open-source and can be deployed on a private server, workstation or even laptop for personal use. We have created some initial tutorials to train users to learn how to extend the Clang/LLVM and ROSE compiler to support new OpenMP features. We welcome anyone to try out our system, give us feedback, contribute new training courses, or enhance the training platform to make it an effective learning resource for the HPC community.
Volume 11, Issue 1 (January 2020), pp. 61–67
https://doi.org/10.22369/issn.2153-4136/11/1/10In a software lab, groups of students develop parallel code using modern tools, document the results and present their solutions. The learning objectives include the foundations of High-Performance Computing (HPC), such as the understanding of modern architectures, the development of parallel programming skills, and coursespecific topics, like accelerator programming or cluster set up. In order to execute the labs successfully with limited personnel resources and still provide students with access to world-class HPC architectures, we developed a set of concepts to motivate students and to track their progress. This includes the learning status survey and the developer diary, which are presented in this work. We also report on our experiences with using innovative teaching concepts to incentivize students to optimize their codes, such as using competition among the groups. Our concepts enable us to track the effectiveness of our labs and to steer them for increasing sizes of diverse students. We conclude that software labs are effective in adding practical experiences to HPC education. Our approach to hand out open tasks and to leave creative freedom in implementing the solutions enables the students to self-pace their learning process and to vary their investment of effort during the semester. Our effort and progress tracking ensures the achieving of the extensive learning objectives and enables our research on HPC programming productivity.
Volume 11, Issue 1 (January 2020), pp. 68–72
https://doi.org/10.22369/issn.2153-4136/11/1/11The Computational Mathematics, Science and Engineering (CMSE) department is one of the newest units at Michigan State University (MSU). Founded in 2015, CMSE recognizes computation as the "triple junction" of algorithm development and analysis, high performance computing, and applications to scientific and engineering modeling and data science (as illustrated in Figure 1). This approach is designed to engage with computation as a new integrated discipline, rather than a series of decentralized, isolated sub-specialties. In the four years since its inception, the department has grown and flourished; however, the pathway was sometimes arduous. This paper shares lessons learned during the department's development and the initiatives it has taken on to support computational research and education across the university. By sharing these lessons, we hope to encourage and support the establishment of similar departments at other universities and grow this integrated approach to scientific computation as a discipline.
Volume 11, Issue 1 (January 2020), pp. 73–80
https://doi.org/10.22369/issn.2153-4136/11/1/12For the past thirteen years, Los Alamos National Laboratory HPC Division has hosted the Computer System, Cluster and Networking Summer Institute summer internship program (recently renamed "The Supercomputer Institute") to provide a basis is cluster computing for undergraduate and graduate students. The institute invites 12 students each year to participate in a 10-week internship program. This program has been a strong educational experience for many students through this time, and has been an important recruitment tool for HPC Division. In this paper, we describe the institute as a whole and dive into individual components that were changed this year to keep the program up to date. We also provide some qualitative and quantitative results that indicate that these changes have improved the program over recent years.
Volume 11, Issue 1 (January 2020), pp. 81–87
https://doi.org/10.22369/issn.2153-4136/11/1/13While strides have been made to improve science and math readiness at a college-preparatory level, some key fundamentals have been left unaddressed that can cause students to turn away from the STEM disciplines before they find their niche [10], [11], [12], [13]. Introducing collegiate level research and project-based, group-centered learning at a high school level has a multi-faceted effect; in addition to elevated learning outcomes in science and math, students exhibit improved critical thinking and communication skills, leading to improved preparedness for subsequent academic endeavors [1]. The work presented here outlines the development of a STEM ecosystem where both the science department and math department have implemented an interdisciplinary approach to introduce a spectrum of laboratory and computing research skills. This takes the form of both "in situ," micro-curricular elements and stand-alone research and computer science classes which integrate the language-independent concepts of abstraction and object-oriented programming, distributed and high-performance computing, and high and low-level language control applications. This pipeline has been an effective tool that has allowed several driven and interested students to participated in collegiate-level and joint-collegiate projects involving virtual reality, robotics and systems controls, and modeling. The willingness of the departments to cross-pollinate, hire faculty wellversed in research, and support students and faculty with the proper resources are critical factors in readying the next generation of computing leaders.
Volume 11, Issue 1 (January 2020), pp. 88–92
https://doi.org/10.22369/issn.2153-4136/11/1/14HPC and Scientific Computing are integral tools for sustaining the growth of scientific research. Additionally, educating future domain scientists and research-focused IT staff about the use of computation to support research is as important as capital expenditures on new resources. The aim of this paper it to describe the parallel computing portion of Purdue University's HPC seminar series which is used as a tool to introduce students from many non-traditional disciplines to scientific, parallel and high-performance computing.
Volume 11, Issue 1 (January 2020), pp. 93–99
https://doi.org/10.22369/issn.2153-4136/11/1/15Developments in large scale computing environments have led to design of workflows that rely on containers and analytics platform that are well supported by the commercial cloud. The National Science Foundation also envisions a future in science and engineering that includes commercial cloud service providers (CSPs) such as Amazon Web Services, Azure and Google Cloud. These twin forces have made researchers consider the commercial cloud as an alternative option to current high performance computing (HPC) environments. Training and knowledge on how to migrate workflows, cost control, data management, and system administration remain some of the commonly listed concerns with adoption of cloud computing. In an effort to ameliorate this situation, CSPs have developed online and in-person training platforms to help address this problem. Scalability, ability to impart knowledge, evaluating knowledge gain, and accreditation are the core concepts that have driven this approach. Here, we present a review of our experience using Google's Qwiklabs online platform for remote and in-person training from the perspective of a HPC user. For this study, we completed over 50 online courses, earned five badges and attended a one-day session. We identify the strengths of the approach, identify avenues to refine them, and consider means to further community engagement. We further evaluate the readiness of these resources for a cloud-curious researcher who is familiar with HPC. Finally, we present recommendations on how the large scale computing community can leverage these opportunities to work with CSPs to assist researchers nationally and at their home institutions.
Volume 11, Issue 1 (January 2020), pp. 100–105
https://doi.org/10.22369/issn.2153-4136/11/1/16The ability to grow and teach systems professionals relies on having the capacity to let students interact with supercomputers at levels not given to normal users. In this paper we describe the teaching methods and hardware platforms used by Purdue Research Computing to train undergraduates for HPC systems-facing roles. From Raspberry Pi clusters to the LittleFe project, previous work has focused on providing miniature hardware platforms and developing curriculums for teaching. Recently, we have developed and employed a method using virtual machines to reach a wider audiences, created best practices, and removed barriers for approaching coursework. This paper outlines the system we have designed, expands on the benefits and drawbacks over hardware systems, and discusses the failures and successes we have had teaching HPC System Administrators.
Volume 11, Issue 1 (January 2020), pp. 106–107
https://doi.org/10.22369/issn.2153-4136/11/1/17The International HPC Certification Program has been officially launched over a year ago at ISC'18 and since then made significant progress in categorising and defining the skills required to proficiently use a variety of HPC systems. The program reached the stage when the support and input from the HPC community is essential. For the certification to be recognised widely, it needs to capture skills required by majority of HPC users, regardless of their level. This cannot be achieved without contributions from the community. This extended abstract briefly presents the current state of the developed Skill Tree and explains how contributors can extend it. In the talk, we focus on the contribution aspects.
Volume 10, Issue 1 (January 2019), pp. 1–3
A brief introduction to this issue of the Journal of Computational Science Education from the guest editor.
Volume 10, Issue 1 (January 2019), pp. 4–11
https://doi.org/10.22369/issn.2153-4136/10/1/1In this paper we describe our experience in developing curriculum courses aimed at graduate students in emerging computational fields, including biology and medical science. We focus primarily on computational data analysis and statistical analysis, while at the same time teaching students best practices in coding and software development. Our approach combines a theoretical background and practical applications of concepts. The outcomes and feedback we have obtained so far have revealed several issues: students in these particular areas lack instruction like this although they would tremendously benefit from it; we have detected several weaknesses in the formation of students, in particular in the statistical foundations but also in analytical thinking skills. We present here the tools, techniques and methodology we employ while teaching and developing this type of courses. We also show several outcomes from this initiative, including potential pathways for fruitful multi- disciplinary collaborations.
Volume 10, Issue 1 (January 2019), pp. 12–15
https://doi.org/10.22369/issn.2153-4136/10/1/2Advanced computational inorganic methods were introduced as course-based undergraduate research experiences (CUREs) through use of the National Science Foundation's Extreme Science and Engineering Discovery Environment (NSF XSEDE). The ORCA ab initio quantum chemistry program allowed students to conduct independent research projects following in-class lectures and tutorials. Students wrote publication-style papers and conducted peer review of classmates' papers to learn about the full scientific process.
Volume 10, Issue 1 (January 2019), pp. 16–20
https://doi.org/10.22369/issn.2153-4136/10/1/3XSEDE Service Providers (SPs) and resources have the benefit of years of testing and implementation, tuning and configuration, and the development of specific tools to help users and systems make the best use of these resources. Cyberinfrastructure professionals at the campus level are often charged with building computer resources which are compared to these national-level resources. While organizations and companies exist that guide cyberinfrastructure configuration choices down certain paths, there is no easy way to distribute the long-term knowledge of the XSEDE project to campus CI professionals. The XSEDE Cyberinfrastructure Resource Integration team has created a variety of toolkits to enable easy knowledge and best-practice transfer from XESDE SPs to campus CI professionals. The XSEDE National Integration Toolkit (XNIT) provides the software used on most XSEDE systems in an effort to propagate the best practices and knowledge of XSEDE resources. XNIT includes basic tools and configuration that make it simpler for a campus cluster to have the same software set and many of the advantages and XSEDE SP resource affords. In this paper, we will detail the steps taken to build such a library of software and discuss the challenges involved in disseminating awareness of toolkits among cyberinfrastructure professionals. We will also describe our experiences in updating the XNIT to be compatible with the OpenHPC project, which forms the basis of many new HPC systems, and appears situated to become the de-facto choice of management software provider for many HPC centers.
Volume 10, Issue 1 (January 2019), pp. 21–23
https://doi.org/10.22369/issn.2153-4136/10/1/4Students in a course on high performance computing were assigned the task of parallelizing an algorithm for recursive matrix multiplication. The objectives of the assignment were to: (1) design a basic approach for incorporating parallel programming into a recursive algorithm, and (2) optimize the speedup. Pseudocode was provided for recursive matrix multiplication, and students were required to first implement a serial version before implementing a parallel version. The parallel version had the following requirements: (1) use OpenMP to perform multithreading, and (2) use exactly 4 threads, where each thread computes one quadrant of the array product. Using a class size of 23 students, including undergraduate and graduate, approximately 70% of the students designed valid parallel solutions, and 13% achieved the optimal speedup of 4x. Common errors included recursively creating excessive threads, failing to parallelize all possible mathematical operations, and poor use of compiler directives for OpenMP.
Volume 10, Issue 1 (January 2019), pp. 24–31
https://doi.org/10.22369/issn.2153-4136/10/1/5We present an overview of current academic curricula for Scientific Computing, High-Performance Computing and Data Science. After a survey of current academic and non-academic programs across the globe, we focus on Canadian programs and specifically on the education program of the SciNet HPC Consortium, using its detailed enrollment and course statistics for the past six to seven years. Not only do these data display a steady and rapid increase in the demand for research-computing instruction, they also show a clear shift from traditional (high performance) computing to data- oriented methods. It is argued that this growing demand warrants specialized research computing degrees.
Volume 10, Issue 1 (January 2019), pp. 32–39
https://doi.org/10.22369/issn.2153-4136/10/1/6The external evaluation activities in the first three years of the Blue Waters Community Engagement program for graduate fellows and undergraduate interns are described in this study. Evaluators conducted formative and summative evaluations to acquire data from the participants at various stages during this period. Details regarding the evaluation methodology, implementation, results, information feedback process, and the overall program impact based on these evaluation findings are outlined here. Participants in both groups were selected from a variety of different scientific backgrounds and their high performance computing expertise also varied at the outset of the program. Implementation challenges stemming from these issues were identified through the evaluation, and accommodations were made in the initial phases of the program. As a result, both the graduate fellowship and undergraduate internship programs were able to successfully overcome many of the identified problems by the end of the third year. The evaluation results also show the significant impact the program was able to make on the future careers of the participants.
Volume 10, Issue 1 (January 2019), pp. 40–47
https://doi.org/10.22369/issn.2153-4136/10/1/7Short courses offered by High Performance Computing (HPC) centers offer an avenue for aspiring Cyberinfrastructure (CI) professionals to learn much-needed skills in research computing. Such courses are a staple at universities and HPC sites around the country. These short courses offer an informal curricular model of short, intensive, and applied micro-courses that address generalizable competencies in computing as opposed to content expertise. The degree of knowledge sophistication is taught at the level of below a minor and the burden of application to domain content is on the learner. Since the Spring 2017 semester, Texas A&M University High Performance Research Computing (TAMU HPRC) has introduced a series of interventions in its short courses program that has led to a 300% growth in participation. Here, we present the strategies and best practices employed by TAMU HPRC in teaching short course modules. We present a longitudinal report that assesses the success of these strategies since the Spring semester of 2017. This data suggests that changes to student learning and a reimagination of the tiered instruction model widely adopted at institutions could be beneficial to student outcomes.
Volume 10, Issue 1 (January 2019), pp. 48–52
https://doi.org/10.22369/issn.2153-4136/10/1/8The Pawsey Supercomputing Centre has been running a variety of education, training and outreach activities addressed to all Australian researchers for a number of years. Based on experience and user feedback we have developed a mix of on-site and online training, roadshows, user forums and hackathon-type events. We have also developed an open repository of materials covering different aspects of HPC systems usage, parallel programming techniques as well as cloud and data resources usage. In this paper, we will share our experience in using different learning methods and tools to address specific educational and training purposes. The overall goal is to emphasise that there is no universal learning solution, instead, various solutions and platforms need to be carefully selected for different groups of interest.
Volume 10, Issue 1 (January 2019), pp. 53–60
https://doi.org/10.22369/issn.2153-4136/10/1/9We analyze the changes in the training and educational efforts of the SciNet HPC Consortium, a Canadian academic High Performance Computing center, in the areas of Scientific Computing and High-Performance Computing, over the last six years. Initially, SciNet offered isolated training events on how to use HPC systems and write parallel code, but the training program now consists of a broad range of workshops and courses that users can take toward certificates in scientific computing, data science, or high-performance computing. Using data on enrollment, attendence, and certificate numbers from SciNet's education website, used by almost 1800 users so far, we extract trends on the growth, demand, and breadth of SciNet's training program. Among the results are a steady overall growth, a sharp and steady increase in the demand for data science training, and a wider participation of 'non-traditional' computing disciplines, which has motivated an increasingly broad spectrum of training offerings. Of interest is also that many of the training initiatives have evolved into courses that can be taken as part of the graduate curriculum at the University of Toronto.
Volume 10, Issue 1 (January 2019), pp. 61–66
https://doi.org/10.22369/issn.2153-4136/10/1/10There is a growing need to provide intermediate programming classes to STEM students early in their undergraduate careers. These efforts face significant challenges due to the varied computing skill-sets of learners, requirements of degree programs, and the absence of a common programming standard. Instructional scaffolding and active learning methods that use Python offer avenues to support students with varied learning needs. Here, we report on quantitative and qualitative outcomes from three distinct models of programming education that (i) connect coding to hands- on "maker" activities; (ii) incremental learning of computational thinking elements through guided exercises that use Jupyter Notebooks; and (iii) problem-based learning with step-wise code fragments leading to algorithmic implementation. Performance in class activities, capstone projects, in-person interviews, and participant surveys informed us about the effectiveness of these approaches on student learning. We find that students with previous coding experience were able to rely on broader skills and grasp concepts faster than students who recently attended an introductory programming session. We find that, while makerspace activities were engaging and explained basic programming concepts, they lost their appeal in complex programming scenarios. Students grasped coding concepts fastest using the Jupyter notebooks, while the problem-based learning approach was best at having students understand the core problem and create inventive means to address them.
Volume 10, Issue 1 (January 2019), pp. 67–73
https://doi.org/10.22369/issn.2153-4136/10/1/11This work explores the applicability of Massively Open Online Courses (MOOCs) for scaling High Performance Computing (HPC) training and education. Most HPC centers recognize the need to provide their users with HPC training; however, the current educational structure and accessibility prevents many scientists and engineers who need HPC knowledge and skills from becoming HPC practitioners. To provide more accessible and scalable learn- ing paths toward HPC expertise, the authors explore MOOCs and their related technologies and teaching approaches. In this paper the authors outline how MOOC courses differ from face-to-face training, video-capturing of live events, webinars, and other established teaching methods with respect to pedagogical design, development issues and deployment concerns. The work proceeds to explore two MOOC case studies, including the design decisions, pedagogy and delivery. The MOOC development methods discussed are universal and easily replicated by educators and trainers in any field; however, HPC has specific technical needs and concerns not encountered in other online courses. Strategies for addressing these HPC concerns are discussed throughout the work.
Volume 10, Issue 1 (January 2019), pp. 74–80
https://doi.org/10.22369/issn.2153-4136/10/1/12High performance computing training and education typically emphasizes the first-principles of scientific programming, such as numerical algorithms and parallel programming techniques. How- ever, many computational scientists need to know how to compile and link to applications built by others. Likewise, those who create the libraries and applications need to understand how to organize their code to make it as portable as possible and package it so that it is straightforward for others to use. These topics are not currently addressed by the current HPC education or training curriculum and users are typically left to develop their own approaches. This work will discuss observations made by the author over the last 20 years regarding the common problems encountered in the scientific community when developing their own codes and building codes written by other computational scientists. Recommendations will be provided for a training curriculum to address these shortcomings.
Volume 10, Issue 1 (January 2019), pp. 81–87
https://doi.org/10.22369/issn.2153-4136/10/1/13The Cyberinfrastructure Security Education for Professionals and Students (CiSE-ProS) virtual reality environment is an exploratory project that uses engaging approaches to evaluate the impact of learning environments produced by augmented reality (AR) and virtual reality (VR) technologies for teaching cybersecurity concepts. The program is steeped in well-reviewed pedagogy; the refinement of the educational methods based on constant assessment is a critical factor that has contributed to its success. In its current implementation, the program supports undergraduate student education. The overarching goal is to develop the CiSE-ProS VR program for implementation at institutions with low cyberinfrastructure adoption where students may not have access to a physical data center to learn about the physical aspects of cybersecurity.
Volume 10, Issue 1 (January 2019), pp. 88–89
https://doi.org/10.22369/issn.2153-4136/10/1/14The HPC community has always considered the training of new and existing HPC practitioners to be of high importance to its growth. This diversification of HPC practitioners challenges the traditional training approaches, which are not able to satisfy the specific needs of users, often coming from non-traditionally HPC disciplines, and only interested in learning a particular set of competences. Challenges for HPC centres are to identify and overcome the gaps in users' knowledge, while users struggle to identify relevant skills. We have developed a first version of an HPC certification pro- gram that would clearly categorize, define, and examine competences. Making clear what skills are required of or recommended for a competent HPC user would benefit both the HPC service providers and practitioners. Moreover, it would allow centres to bundle together skills that are most beneficial for specific user roles and scientific domains. From the perspective of content providers, existing training material can be mapped to competences allowing users to quickly identify and learn the skills they require. Finally, the certificates recognized by the whole HPC community simplify inter-comparison of independently offered courses and provide additional incentive for participation.
Volume 10, Issue 1 (January 2019), pp. 90–92
https://doi.org/10.22369/issn.2153-4136/10/1/15A course on high performance computing (HPC) at Case Western Reserve University included students with a range of technical and academic experience. We consider these experiential differences with regard to student performance and perceptions. The course relied heavily on C programming and multithreading, but one third of the students had no prior experience with these techniques. Academic experience also varied, as the class included 3rd and 4th year undergraduates, master's students, PhD students, and a non- degree student. Results indicate that student performance did not depend on technical experience. However, average overall performance was slightly higher for graduate students. Additionally, we report on students' perceptions of the course and the assigned work.
Volume 10, Issue 1 (January 2019), pp. 93–99
https://doi.org/10.22369/issn.2153-4136/10/1/16Over the past two decades, High-Performance Computing (HPC) communities have developed many models for delivering education aiming to help students understand and harness the power of parallel and distributed computing. Most of these courses either lack a hands-on component or heavily focus on theoretical characterization behind complex algorithms. To bridge the gap between application and scientific theory, NVIDIA Deep Learning Institute (DLI) (nvidia.com/dli) has designed an on-line education and training platform that helps students, developers, and engineers solve real-world problems in a wide range of domains using deep learning and accelerated computing. DLI's accelerated computing course content starts with the fundamentals of accelerating applications with CUDA and OpenACC in addition to other courses in training and deploying neural networks for deep learning. Advanced and domain-specific courses in deep learning are also available. The online platform enables students to use the latest AI frameworks, SDKs, and GPU-accelerated technologies on fully-configured GPU servers in the cloud so the focus is more on learning and less on environment setup. Students are offered project-based assessment and certification at the end of some courses. To support academics and university researchers teaching accelerated computing and deep learning, the DLI University Ambassador Program enables educators to teach free DLI courses to university students, faculty, and researchers.
Volume 10, Issue 1 (January 2019), pp. 100–106
https://doi.org/10.22369/issn.2153-4136/10/1/17A significant challenge in teaching cluster computing, an advanced topic in the parallel and distributed computing body of knowledge, is to provide students with an adequate environment where they can become familiar with real-world infrastructures that embody the conceptual principles taught in lectures. In this paper, we de- scribe our experience setting up such an environment by leveraging CloudLab, a national experimentation platform for advanced computing research. We explored two approaches in using CloudLab to teach advanced concepts in cluster computing: direct deployment of virtual machines (VMs) on bare-metal nodes and indirect deployment of VMs inside a CloudLab-based cloud.
Volume 10, Issue 1 (January 2019), pp. 107–107
https://doi.org/10.22369/issn.2153-4136/10/1/18Cloud computing is growing area for educating students and performing meaningful scientific research. The challenge for many educators and researchers is knowing how to use some of the unique aspects of computing in the cloud. One key feature is true elastic computing — resources on demand. The elasticity and programmability of cloud resources make them an excellent tool for educators who require access to a wide range of computing environments. In the field of HPC education, such environments are an absolute necessity, and getting access to them can create a large burden on the educators above and beyond designing content. While cloud resources won't replace traditional HPC environments for large research projects, they are an excellent option for providing both user and administrator education on HPC environments. The highly configurable nature of cloud environments allows educators to tailor the educational resource to the needs of their attendees, and provide a wide range of hands-on experiences. In this demo, we'll show how the Jetstream cloud environment can be used to provide training for both new HPC administrators and users, by showing a ground-up build of a simple HPC system. While this approach uses the Jetstream cloud, it is generalizable across any cloud provider. We will show how this allows an educator to tackle everything from basic command-line concepts and scheduler use to advanced cluster-management concepts such as elasticity and management of scientific software.
Volume 10, Issue 1 (January 2019), pp. 108–110
https://doi.org/10.22369/issn.2153-4136/10/1/19In this contribution, we discuss our experiences organizing the Best Practices for HPC Software Developers (HPC-BP) webinar series, an effort for the dissemination of software development methodologies, tools and experiences to improve developer productivity and software sustainability. HPC-BP is an outreach component of the IDEAS Productivity Project [4] and has been designed to support the IDEAS mission to work with scientific software development teams to enhance their productivity and the sustainability of their codes. The series, which was launched in 2016, has just presented its 22nd webinar. We summarize and distill our experiences with these webinars, including what we consider to be "best practices" in the execution of both individual webinars and a long-running series like HPC-BP. We also discuss future opportunities and challenges in continuing the series.
Volume 9, Issue 2 (December 2018), pp. 1–1
A brief introduction to this issue of the Journal of Computational Science Education from the editor.
Volume 9, Issue 2 (December 2018), pp. 2–13
https://doi.org/10.22369/issn.2153-4136/9/2/1Students face many difficulties dealing with physics principles and concepts during physics problem solving. For example, they lack the understanding of the components of formulas, as well as of the physical relationships between the two sides of a formula. To overcome these difficulties some educators have suggested integrating simulations design into physics learning. They claim that the programming process necessarily fosters understanding of the physics underlying the simulations. We investigated physics learning in a high-school course on computational science. The course focused on the development of computational models of physics phenomena and programming corresponding simulations. The study described in this paper deals with the development of students' conceptual physics knowledge throughout the course. Employing a qualitative approach, we used concept maps to evaluate students' physics conceptual knowledge at the beginning and the end of the model development process, and at different stages in between. We found that the students gained physics knowledge that has been reported to be difficult for high-school and even undergraduate students. We use two case studies to demonstrate our method of analysis and its outcomes. We do that by presenting a detailed analysis of two projects in which computational models and simulations of physics phenomena were developed.
Volume 9, Issue 2 (December 2018), pp. 14–22
https://doi.org/10.22369/issn.2153-4136/9/2/2Markov State Models (MSMs) are a powerful framework to reproduce the long-time conformational dynamics of biomolecules using a set of short Molecular Dynamics (MD) simulations. However, precise kinetics predictions of MSMs heavily rely on the features selected to describe the system. Despite the importance of feature selection for large system, determining an optimal set of features remains a difficult unsolved problem. Here, we introduce an automatic approach to optimize feature selection based on genetic algorithms (GA), which adaptively evolves the most fitted solution according to natural selection laws. The power of the GA-based method is illustrated on long atomistic folding simulations of four proteins, varying in length from 28 to 80 residues. Due to the diversity of tested proteins, we expect that our method will be extensible to other proteins and drive MSM building to a more objective protocol.
Volume 9, Issue 2 (December 2018), pp. 23–29
https://doi.org/10.22369/issn.2153-4136/9/2/3Graph algorithms have many applications. Many real-world problems can be solved using graph algorithms. Graph algorithms are commonly taught in the data structures, algorithms, and discrete mathematics courses. We have created two animations to visually demonstrate the graph algorithms. The first animation is for depth-first search, breadth-first search, shortest paths, connected components, finding bipartite sets, and Hamiltonian path/cycle on unweighted graphs. The second animation is for the minimum spanning trees, shortest paths, travelling salesman problems on weighted graphs. The animations are developed using HTML, CSS, and JavaScript and are platform independent. They can be viewed from a browser on any device. The animations are useful tools for teaching and learning graph algorithms. This paper presents these animations.
Volume 9, Issue 2 (December 2018), pp. 30–36
https://doi.org/10.22369/issn.2153-4136/9/2/4In this project we designed an Artificial Neural Network (ANN) computational model to predict the activity of short oligonucleotide sequences (octamers) with important biological role as exonic splicing enhancers (ESE) motifs recognized by human SR protein SC35. Since only active sequences were available from the literature as our initial data set, we generated an additional set of complementary sequences to the original set. We used back-propagation neural network (BPNN) with MATLAB® Neural Network Toolbox™ on our research designated computer. In Stage I of our project we trained, validated and tested the BPNN prototype. We started with 20 samples in the training and 8 samples in the validation sets. Trained and validated BPNN prototype was then used to test the unique set of 10 octamer sequences with 5 active samples and their 5 complementary sequences. The test showed 2 classification errors, one false positive and the other false negative. We used the test data and moved into Stage II of the project. First, we analyzed the initial DNA numerical representation (DNR) and changed the scheme to achieve higher difference between the subsets of active and complementary sequences. We compared the BPNN results with different numbers of nodes in the second hidden layer to optimize model accuracy. To estimate future model performance we needed to test the classifier on newly collected data from another paper. This practical application included the testing of 41 published, non-repeating SC35 ESE motif octamers, together with 41 complementary sequences. The test showed high BPNN accuracy in the predictive power for both (active and inactive) categories. This study shows the potential for using a BPNN to screen SC35 ESE motif candidates.
Volume 9, Issue 2 (December 2018), pp. 37–45
https://doi.org/10.22369/issn.2153-4136/9/2/5With the recent advances in next generation sequencing technology, analysis of prevalent DNA sequence variants from patients with a particular disease has become an important tool for understanding the associations between the disease and genetic mutations. A publicly accessible bioinformatics pipeline, called OncoMiner (http://oncominer.utep.edu), was implemented in 2016 to help biomedical researchers analyze large genomic datasets from patients with cancer. However, the current version of OncoMiner can only accept input files with a highly specific format for sequence variant description. In order to handle data from a broader range of sequencing platforms, a data preprocessing tool is necessary. We have therefore implemented the OncoMiner Preprocessing (OP) program for parsing data files in the popular FastQ and BAM formats to generate an OncoMiner input file. OP involves using the open source Bowtie2 and SAMtools software, followed by a python script we developed for genetic sequence variant identification. To preprocess very large datasets efficiently, the OP program has been parallelized on two local computers and the Blue Waters system at the National Center for Supercomputing Applications using a multiprocessing approach. Although reasonable parallelization efficiency has been obtained on the local computers, the OP program's speedup on Blue Waters has been limited, possibly due to I/O issues and individual node memory constraints. Despite these, Blue Waters has provided the necessary resources to process 35 datasets from patients with acute myeloid leukemia and demonstrated significant correlation of OP runtimes with the BAM input size and chromosome diversity.
Volume 9, Issue 1 (May 2018), pp. 1–1
A brief introduction to this issue of the Journal of Computational Science Education from the editor.
Volume 9, Issue 1 (May 2018), pp. 2–12
https://doi.org/10.22369/issn.2153-4136/9/1/1The alias feature of the Berkeley Madonna platform allows this author to create a chemical kinetics project manual for students to create flow charts with rate equations consistent with their learning from physical chemistry textbooks. The platform used in this way becomes versatile and powerful that allow students to explore any chemical kinetics problems from simple (e.g. 1st or 2nd order kinetics) to complex (e.g. stratosphere ozone depletion, the Lotka-Volterra mechanism) bypassing complicate syntax that are required by most of the powerful mathematical programs. This kinetics manual has been successfully implemented in UW-Green Bay in the fall semester of 2017 with the students' success rate greater than 80%.
Volume 9, Issue 1 (May 2018), pp. 13–18
https://doi.org/10.22369/issn.2153-4136/9/1/2This paper describes introducing rate of change and systems modeling paradigms and software as tools to increase appreciation for computational science. A similar approach was used with three different audiences: freshman liberal arts majors, junior math education majors, and college faculty teaching introductory science courses. A description of the implementation used with each audience and their reactions to the material is discussed, along with some example problems that could be used in a variety of courses.
Volume 9, Issue 1 (May 2018), pp. 19–28
https://doi.org/10.22369/issn.2153-4136/9/1/3A final project assignment is described for an interdisciplinary applied numerical computing upper division and graduate elective in which students develop a GUI for defining and solving a system of ordinary differential equations (initial value problems) and the associated explicit algebraic equations such as values for parameters. The primary task is to develop a GUI for MATLAB using GUIDE that takes a user-specified number of differential equations and explicit algebraic equations as input, solves the system of ODEs using \mcode{ode45}, returns the solution vector, and plots the solution vector components vs. the independent variable. The code for the GUI must be verified by showing that it returns the same results and the same figures as a system of ODEs with a known solution. The purpose of the final project assignment is threefold: (1) to practice GUI design and construction in MATLAB, (2) to verify code implementation, and (3) to review content covered throughout the course. The manuscript first introduces the course and the context and motivation for the project. Then the project assignment is detailed. Two student project submissions are described. The verification case study is also provided.
Volume 9, Issue 1 (May 2018), pp. 29–38
https://doi.org/10.22369/issn.2153-4136/9/1/4We implemented two new models for star formation and supernova feedback into the astrophysical code Enzo. These models are designed to efficiently capture the bulk properties of galaxies and the influence of the circumgalactic medium (CGM). Unlike Enzo's existing models, these do not track stellar populations over time with computationally expensive particle objects. Instead, supernova explosions immediately follow stellar birth and their feedback is deposited in a volumetric manner. Our models were tested using simulations of Milky Way-like isolated galaxies, and we found that neither model was able to produce a realistic, metal-enriched CGM. Our work suggests that volumetric feedback models are not sufficient replacements for particle-based star formation and feedback models.
Volume 8, Issue 3 (December 2017), pp. 1–1
A brief introduction to this issue of the Journal of Computational Science Education from the editor.
Volume 8, Issue 3 (December 2017), pp. 2–10
https://doi.org/10.22369/issn.2153-4136/8/3/1With parallel and distributed computing (PDC) now in the core CS curriculum, CS educators are building new pedagogical tools to teach their students about this cutting-edge area of computing. In this paper, we present an innovative approach we call microclusters - personal, portable Beowulf clusters - that provide students with hands-on PDC learning experiences. We present several different microclusters, each built using a different combination of single board computers (SBCs) as its compute nodes, including various ODROID models, Nvidia's Jetson TK1, Adapteva's Parallella, and the Raspberry Pi. We explore different ways that CS educators are using these systems in their teaching, and describe specific courses in which CS educators have used microclusters. Finally, we present an overview of sources of free PDC pedagogical materials that can be used with microclusters.
Volume 8, Issue 3 (December 2017), pp. 11–18
https://doi.org/10.22369/issn.2153-4136/8/3/2This paper presents an innovative hybrid learning model as well as the tools, resources, and learning environment to promote active learning for both face-to-face students and online students. Most small universities in the United States lack adequate resources and cost justifiable enrollments to offer Computational Science and Engineering (CSE) courses. The goal of the project was to find an effective and affordable model for small universities to prepare underserved students with marketable analytical skills in CSE. As the primary outcome, the project created a cluster of collaborating institutions that combines students into common classes and used cyberlearning learning tools to deliver and manage instruction. The instrumental tools for educational technologies included Smart Podium, digital projector, teleconference system such as AdobeConnect, auto tracking camera and high quality audios in both local and remote classrooms. As innovative active learning environment, R&D process was used to provide a coherent framework for designing instruction and assessing learning. Course design centered on model-based learning which proposes that students learn complex content by elaborating on their mental model, developing a conceptual model, refining a mathematical model, and conducting experiments to validate and revise their conceptual and mathematical models. A wave lab and underwater robotics lab were used to facilitate the experimental components of hands-on research projects. Course delivery included interactive live online help sessions, immediate feedback to students, peer support, and teamwork which were crucial for student success. Another key feature of instruction of the project was using emerging technologies such as HIMATT [8] to evaluate how students think through and model complex, ill-defined and ill-structured realistic problems.
Volume 8, Issue 3 (December 2017), pp. 19–24
https://doi.org/10.22369/issn.2153-4136/8/3/3Mie theory is used to model the scattering off of wavelength-sized microspheres. It has numerous applications for many different geometries of spheres. The calculations of the electromagnetic fields involve large sums over vector spherical harmonics. Thus, the simple task of calculating the fields, along with additional analytical tools such as cross sections and intensities, require large summations that are conducive to high performance computing. In this paper, we derive Mie theory from first principles, and detail the process and results of programming Mie theory physics in Fortran 95. We describe the theoretical background specific to the microspheres in our system and the procedure of translating functions to Fortran. We then outline the process of optimizing the code and parallelizing various functions, comparing efficiencies and runtimes. The shorter runtimes of the Fortran functions are then compared to their corresponding functions in Wolfram Mathematica. Fortran has shorter runtimes than Mathematica by between one and four orders of magnitude for our code. Parallelization further reduces the runtimes of the Fortran code for large jobs. Finally, various plots and data related to scattering by dielectric spheres are presented.
Volume 8, Issue 3 (December 2017), pp. 25–29
https://doi.org/10.22369/issn.2153-4136/8/3/4We present results and lessons learned from a 2015-2016 Blue Waters Student Internship. The project was to perform preliminary simulations of an astrophysics application, Black Widow binary systems, with the adaptive-mesh simulation code Castro. The process involved updating the code as needed to run on Blue Waters, constructing initial conditions, and performing scaling tests exploring Castro's hybrid message passing/threaded architecture.
Volume 8, Issue 3 (December 2017), pp. 30–35
https://doi.org/10.22369/issn.2153-4136/8/3/5Derechos are a dangerous, primarily non-tornadic severe weather outbreak type responsible for a variety of atmospheric hazards. However, the exact predictability of these events by lead time is unknown, yet would likely be invaluable to forecasters responsible for predicting these events. As such, the predictability of nontornadic outbreaks by lead time was assessed. Five derecho events spanning 1979 to 2012 were selected and simulated using the Weather Research and Forecasting (WRF) model at 24, 48, 72, 96, and 120-hours lead time. Nine stochastically perturbed initial conditions were generated for each case and each lead time, yielding an ensemble of derecho simulations. Moment statistics of the derecho composite parameter (DCP), a good proxy for derecho environments, were used to assess variability in forecast quality and precision by lead time. Overall, results showed that 24 and 48 hour simulations had similar variability characteristics, as did 96 and 120 hours. This suggests the existence of a change point or statistically notable drop-off in forecast performance at 72-hours lead time that should be more fully explored in future work. These results are useful for forecasters as they give a first guess as to forecast skill and precision prior to initiating their predictions at lead times of out to 5 days.
Volume 8, Issue 3 (December 2017), pp. 36–43
https://doi.org/10.22369/issn.2153-4136/8/3/6Long-term atmospheric forecasting remains a significant challenge that in the field of operational meteorology. These long-term forecasts are typically completed through the use of climatological variability patterns in the geopotential height fields, known in the field of meteorology as teleconnections. Despite heavy reliance on teleconnections for long-term forecasts, the characterization of these patterns in operational weather models remains inadequate. The purpose of this study is to diagnose the ability of an operational forecast model to render well-known teleconnection patterns. The Weather Research and Forecasting (WRF) model, a commonly employed regional operational forecast model, was used in the simulation of the major 500 mb Northern Hemisphere midlatitude teleconnection patterns. These patterns were formulated using rotated principal component analysis on the 500 mb geopotential height fields. The resulting simulated teleconnection patterns were directly compared to observed teleconnection fields derived from the National Center for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) reanalysis 500 mb geopotential height database, a commonly utilized observational dataset in climate research. Results were quite poor, as the resulting teleconnection patterns only somewhat resembled those constructed on the observed dataset, suggesting a limited capability of the WRF in resolving the underlying variability structure of the hemispheric midlatitude atmosphere. Additionally, configuring the regional model to complete this simulation was met with a series of computational challenges, some of which were not successfully overcome. These results suggest future needs for improvement of the WRF model in reconstructing teleconnection fields and for use in climate modeling.
Volume 8, Issue 2 (August 2017), pp. 1–1
A brief introduction to this issue of the Journal of Computational Science Education from the editor.
Volume 8, Issue 2 (August 2017), pp. 2–9
https://doi.org/10.22369/issn.2153-4136/8/2/1Computing is ubiquitous and perhaps the most common element of our shared experience. However, many students do not seem to recognize the serious applications and implications of computing to the sciences. Wagner College, like many liberal arts colleges, requires a semester of a Computer Science affiliated course to provide students with an exposure to ``technological skills''. Sadly, such courses typically do not delve into high-level computational skills or computational thinking and generally provide instruction in using Microsoft Office\textregistered{} products and rudimentary worldwide web concepts. These courses and approaches were probably valuable a decade ago when computing devices were not quite as prevalent. However, in today's world these courses appear outdated and do not provide relevant skills to the modern undergraduate student. We have created a course called, ``Introduction to Scientific Computing'', to remedy this problem and to provide students with state-of-the-art technological tools. The course provides students with hands-on training on typical work-flows in scientific data analysis and data visualization. Students are trained in the symbolic computing platform, Wolfram Mathematica\textregistered{}, to apply functional programming to develop data analysis and problem solving skills. The course presents computational thinking examples in the framework of various scientific disciplines. This exposure helps students to understand the advantages of technical computing and its direct relevance to their educational goals. The students are also trained to perform molecular visualization using open source software packages to understand secondary and tertiary protein structures, construct molecular animations, and to analyze computer simulation data. These experiences stimulate students to apply these skills across multiple courses and their research endeavors. Student self-assessment data suggests that the course satisfies a unique niche in undergraduate education and enriches the training of future STEM graduates.
Volume 8, Issue 2 (August 2017), pp. 10–16
https://doi.org/10.22369/issn.2153-4136/8/2/2Research experience has been identified as a high-impact intervention for increasing student engagement and retention in STEM. However, authentic undergraduate research leading to primary authorship peer-reviewed publications is a challenge due to the relatively short time the students work on their capstone projects, and the insufficient preparation of the students as researchers. The challenge is further magnified in the field of computer science, where the absence of ``traditional'' labs limits the opportunities of undergraduate students to participate in research. Here we present a novel approach to authentic computer science undergraduate research, based on interdisciplinary computational science and student ownership of their research projects. Instead of the traditional role of undergraduate research assistant, the students select their own research topic based on their personal interests, and with the assistance of a faculty complete all stages of their research project. The uniqueness of the approach is its ability to lead to scientific discoveries and peer-reviewed publications such that the primary author is the student, while allowing the student to experience the entire research process, from defining the research question through analysis of the experimental results. In three years the model led to a dramatic increase in the number of undergraduate students who publish primary-author peer-reviewed scientific papers. The intervention increased the number of peer-reviewed student-authored publications from none to a very high rate of about one third of the students, in many cases publishing in the top outlets in their field.
Volume 8, Issue 2 (August 2017), pp. 17–23
https://doi.org/10.22369/issn.2153-4136/8/2/3We attempted to find a more sustainable solution for performing virtual screening with AutoDock Vina which uses less electricity than computers using typical x64 CPUs. We tested a cluster of ODROID-XU3 Lite computers with ARM CPUs and compared its performance to a server with x64 CPUs. In order to be a viable solution, our cluster needed to perform the screen without sacrificing speed or increasing hardware costs. The cluster completed the virtual screen in a little less time than our comparison server while using just over half the electricity that the server used. Additionally, the hardware for the cluster cost about 38% less than the server, making it a viable solution.
Volume 8, Issue 2 (August 2017), pp. 24–28
https://doi.org/10.22369/issn.2153-4136/8/2/4Bayesian networks may be utilized to infer genetic relations among genes. This has proven useful in providing information about how gene interactions influence life. However, Bayesian network learning is slow due to the nature of the algorithm. K2, a search space reduction, helps speed up the learning process but may introduce bias. To eliminate this bias, multiple Bayesian networks must be computed. This paper evaluates and realizes parallelization of network generation and the reasoning behind the choices made. Methods are developed and tested to evaluate the results of the implemented accelerations. Generating networks across multiple cores results in a linear speed-up with negligible overhead. Distributing the generation of networks across multiple machines also introduces linear speed-up, but results in additional overhead.
Volume 8, Issue 2 (August 2017), pp. 29–36
https://doi.org/10.22369/issn.2153-4136/8/2/5Several microbial genome databases provide collections of thousands of genome annotation files in formats suitable for the performance of complex cognitive activities such as decision making, sense making and analytical reasoning. The goal of the research reported in this article was to interactive analytics resources to support the performance of complex cognitive activities on a collection of publicly available genome information spaces. A supercomputing infrastructure (Blue Waters Supercomputer) provided computational tools to construct information spaces while visual analytics software and online bioinformatics resources provided tools to interact with the constructed information spaces. The Rhizobiales order of bacteria that includes the Brucella genus was the use case for preforming the complex cognitive activities. An interesting finding among the genomes of the dolphin pathogen, Brucella ceti, was a cluster of genes with evidence for function in conditions of limited nitrogen availability.
Volume 8, Issue 2 (August 2017), pp. 37–45
https://doi.org/10.22369/issn.2153-4136/8/2/6Communicating and transferring computational science knowledge and literacy is a tremendously important concept for students at all levels of education to understand. Computational knowledge is especially important due to the tremendous impact that computer programming has had on all scientific and engineering disciplines. As technology evolves, so must our educational system in order for society to evolve as a whole. We undertook direct instruction of a computational science course, and have developed a curriculum that can be expanded upon to provide students entering technical disciplines with the background that they need to be successful. The course would provide insight to the C programming language as well as how computers function at a more basic level. Students would undertake projects that explores how to program simple tasks and operations and ultimately ends in a final project aimed at assessing the knowledge accumulated from the course.
Volume 8, Issue 2 (August 2017), pp. 46–53
https://doi.org/10.22369/issn.2153-4136/8/2/7As a Blue Waters Student Internship Program project, we have developed a model of interplanetary low-thrust trajectories from Earth to Mars for spacecrafts supplying necessary cargo for future human-crewed missions. Since these cargo missions use ionic propulsion that causes a gradual change in the spacecraft's velocity, the modeling is more computationally expensive than conventional trajectories assuming instantaneous spacecraft velocity changes. This model calculates the spacecraft's time of flight and swept angle at different payload masses with other parameters kept constant and correlates them with known locations of the planets. With parallelization using OpenMP on Blue Waters, its runtime has decreased from 10.55 to 1.53 hours. The program takes a user-selected Mars arrival date and outputs a given range of dates with maximum payload capabilities. This parallelized model will greatly reduce the time required for future mission design projects when other factors like spacecraft solar panel power output may vary with new mission specifications. The internship experience has enhanced the intern's ability to manage a project and will impact positively on his future graduate studies or research career.
Volume 8, Issue 1 (February 2017), pp. 1–1
A brief introduction to this issue of the Journal of Computational Science Education from the editor.
Volume 8, Issue 1 (February 2017), pp. 2–6
https://doi.org/10.22369/issn.2153-4136/8/1/1The Diels-Alder reaction is one of the most well-known organic reactions and is widely used for six-membered ring formation. Regio- and stereo-selective Diels-Alder reactions have been emphasized in various areas including pharmaceutical and polymer industries. However, covering the theoretical background of such reactions in an undergraduate class is challenging because the interactions between molecular orbitals is poorly visualized for students. Especially when dealing with polycyclic aromatic hydrocarbons (PAHs) and asymmetric compounds, the complexity of regio- and stereo-selectivity becomes more pronounced. Herein we utilized web-based computational tools (WebMO) to visualize the HOMO-LUMO of each reaction component and their interaction to form chemical bonds. In this study we demonstrated the incorporation of computational aids into a Diels-Alder laboratory class dramatically facilitates students' understanding of several important concepts including frontier orbital theory, thermodynamics of the reaction, three-dimensional visualization, and so on. The assessment of teaching effectiveness prior to and after implementation of computational aids into Diels-Alder reactions will also be discussed in this manuscript.
Volume 8, Issue 1 (February 2017), pp. 7–11
https://doi.org/10.22369/issn.2153-4136/8/1/2"Aligning SequencesSequentially and Concurrently," an educational computational science module by the authors and available online, develops a sequential algorithm to determine the highest similarity score and the alignments that yield this score for two DNA sequences. Moreover, the module considers several approaches to parallelization and speedup. Besides a serial implementation in C, a parallel program in C/MPI is available. This paper describes the module and details experiences using the material in a bioinformatics course at University "Magna Graecia" of Catanzaro, Italy. Besides being appropriate for such a course, the module can provide a meaningful application for a high performance computing or a data structures class.
Volume 8, Issue 1 (February 2017), pp. 12–15
https://doi.org/10.22369/issn.2153-4136/8/1/3In this paper, we describe and detail the creation of and use for our project that allows for augmented reality visualization of data produced using Blue Waters supercomputer or other high performance computers. While molecular structures have been displayed using augmented reality before [1,6], we created a pipeline for using information from the Protein Data Bank and automatically loading it into an augmented reality scene for further display and interaction. We find it important to create an easy way for students, scientists, and anyone else to be able to visualize molecular structures using Augmented Reality because it offers an interactive three dimensional perspective that is typically not available in the classroom. Learning about molecular structures in 2D is much less comprehensive, and our technique for visualization will be free for the end user and offer a great deal of aid to the learning and teaching process. There is no separate purchase required as long as a user has a smart phone or tablet. This is a helpful addition to scientific papers which, if containing the right target image, can be used as the visualization "anchor". The Protein Data Bank (PDB) houses information about proteins, nucleic acids, and more to help scientists and students understand concepts and ideas in biology and chemistry [5]. Our project goal is to open the PDB up to students and people who are not familiar with augmented reality visualization and allow people to learn using the PDB by visualizing molecular structures in different representations, annotating and interacting with the structures, and offering learning modules for common molecular structures. We created a prototype mobile application allowing for molecular visualization of PDB structures, and are continuing to tweak our project for an eventual release to the public.
Volume 8, Issue 1 (February 2017), pp. 16–19
https://doi.org/10.22369/issn.2153-4136/8/1/4The problem of interconnecting nets with multi-port terminals in VLSI circuits is a direct generalization of the Group Steiner Problem (GSP). The GSP is a combinatorial optimization problem which arises in the routing phase of VLSI circuit design. This problem has been intractable, making it impractical to be used in real-world VLSI applications. This paper presents our work on designing and implementing a parallel approximation algorithm for the GSP based off an existing heuristic on a distributed architecture. Our implementation uses the CUDA-aware MPI approach to compute the approximate minimum-cost Group Steiner tree for several industry-standard VLSI graphs. Our implementation achieves up to 103x speedup compared to the best known serial work for the same graph. We present the speedup results for graphs up to 3k vertices. We also investigate some performance bottleneck issues by analyzing and interpreting the program performance data.
Volume 8, Issue 1 (February 2017), pp. 20–26
https://doi.org/10.22369/issn.2153-4136/8/1/5General purpose GPUs are a powerful hardware with a number of applications in the realm of relational databases. We further extended a database framework designed to allow for GPU execution queries. Our technique is novel in that it implements Dynamic Parallelism, a new feature in recent hardware, to accelerate SQL JOINs. Query execution results in 1.25X speedup on average with respect to a previous method, also accelerated by GPUs, which employs a multi-dimensional CUDA Grid. More importantly, we divided the queries to run on multiple BW nodes to investigate the scalability of both SELECT and JOIN.
Volume 7, Issue 1 (April 2016), pp. 1–1
A brief introduction to this issue of the Journal of Computational Science Education from the editor.
Volume 7, Issue 1 (April 2016), pp. 2–14
https://doi.org/10.22369/issn.2153-4136/7/1/1We discuss cognitive aspects of modeling and simulation in an efficacy study of computational pedagogical content knowledge (CPACK) professional development of K-12 STEM teachers. Evidence includes data from a wide range of educational settings over the past ten years. We present a computational model of the mind based on an iterative cycle of deductive and inductive cognitive processes. The model is aligned with empirical research from cognitive psychology and neuroscience and it opens door to a whole series of future studies on computational thinking.
Volume 7, Issue 1 (April 2016), pp. 15–20
https://doi.org/10.22369/issn.2153-4136/7/1/2Geoscience educators in K-12 have limited experience with the quantitative methods used by professionals as part of their everyday work. Many science teachers at this level have backgrounds in other science fields. Even those with geoscience or environmental science backgrounds have limited experience with applying modeling and simulation tools to introduce realworld activities into their classrooms. This article summarizes a project aimed at introducing K-12 geoscience teachers to project based exercises using urban hydrology models that can be integrated into their classroom teaching. The impact of teacher workshops on teacher's confidence and willingness to utilize computer modeling in their classes is also reported.
Volume 7, Issue 1 (April 2016), pp. 21–30
https://doi.org/10.22369/issn.2153-4136/7/1/3This study proposes a research and learning framework for developing and assessing computational thinking under the lens of representational fluency. Representational fluency refers to individuals ability to (a) comprehend the equivalence of different modes of representation and (b) make transformations from one representation to another. Representational fluency was used in this study to guide the design of a robotics lab. This lab experience consisted of a multiple step process in which students were provided with a learning strategy so they could familiarize themselves with representational techniques for algorithm design and the robot programming language. The guiding research question for this exploratory study was: Can we design a learning experience to effectively support individuals computing representational fluency? We employed representational fluency as a framework for the design of computing learning experiences as well as for the investigation of student computational thinking. Findings from the implementation of this framework to the design of robotics tasks suggest that the learning experiences might have helped students increase their computing representational fluency. Moreover, several participants identified that the robotics activities were engaging and that the activities also increased their interest both in algorithm design and robotics. Implications of these findings relate to the use of representational fluency coupled with robotics to integrate computing skills in diverse disciplines.
Volume 7, Issue 1 (April 2016), pp. 31–39
https://doi.org/10.22369/issn.2153-4136/7/1/4The party problem is a mathematical problem in the discipline of Ramsey Theory. Because of the problems embarrassingly parallel nature, its extreme computational requirements, and its relative ease of understanding implementation with a nave algorithm, it is well suited to serve as an example problem for teaching parallel computing. Years ago, a curriculum module for Blue Waters was developed using this problem. However, delays in the delivery of Blue Waters resulted in the module being released before Blue Waters was accessible. Therefore, performance data and compilation instructions for Blue Waters were not available. We have revised the module to provide source code for new versions of the programs to demonstrate more parallel computing libraries. We have also added performance data and compilation instructions for the code in the old version of the module and for the new implementations, which take advantage of the capabilities of the Blue Waters supercomputer now that it is available.
Volume 7, Issue 1 (April 2016), pp. 39–45
https://doi.org/10.22369/issn.2153-4136/7/1/5Optical fields in metamaterial nanostructures can be separated into bright modes, whose dispersion is typically described by effective medium parameters, and dark fluctuating fields. Such combination of propagating and evanescent modes poses a serious numerical complication due to poorly conditioned systems of equations for the amplitudes of the modes. We propose a numerical scheme based on a transfer matrix approach, which resolves this issue for a parallel plate metal-dielectric metamaterial, and demonstrate its effectiveness.
Volume 6, Issue 1 (July 2015), pp. 1–1
A brief introduction to this issue of the Journal of Computational Science Education from the editor.
Volume 6, Issue 1 (July 2015), pp. 2–15
https://doi.org/10.22369/issn.2153-4136/6/1/1In this paper we present an iterative research process to integrate worked examples for introductory programming learning activities. Learning how to program involves many cognitive processes that may result in a high cognitive load. The use of worked examples has been described as a relevant approach to reduce student cognitive load in complex tasks. Learning materials were designed based on instructional principles of worked examples, and were used for a freshmen programming course. Moreover, the learning materials were refined after each iteration based on student feedbacks. The results showed that novice students benefited more than experienced students when exposed to the worked examples. In addition, encouraging students to carry out an elaborated self-explanation of their coded solutions may be a relevant learning strategy when implementing the worked examples pedagogy
Volume 6, Issue 1 (July 2015), pp. 16–24
https://doi.org/10.22369/issn.2153-4136/6/1/2In the authors' experience the languages available for teaching introductory computer programming courses are lacking. In practice, they violate some of the fundamentals taught in an introductory course. This is often the case, for example, with I/O. Picky is a new open source programming language created specifically for education that enables the students to program according to the principles laid down in class. It solves a number of issues the authors had to face while teaching introductory courses for several years in other languages. The language is small, simple and very strict regarding what is a legal program. It has a terse syntax and it is strongly typed and very restrictive. Both the compiler and the runtime include extra checks to provide safety features. The compiler generates byte-code for compatibility and the programming tools are freely available for Linux, MacOSX, Plan 9 from Bell Labs and Windows. This paper describes the language and discusses the motivation to implement it and its main educational features.
Volume 6, Issue 1 (July 2015), pp. 25–31
https://doi.org/10.22369/issn.2153-4136/6/1/3Antibiotic-resistant strains of Mycobacterium tuberculosis have rendered some of the current treatments for tuberculosis ineffective, creating a need for new treatments. Today, the most efficient way to find new drugs to treat tuberculosis and other diseases is to use virtual screening to quickly consider millions of potential drug candidates and filter out all but the ones most likely to inhibit the disease. These top hits can then be tested in a traditional wet lab to determine their potential effectiveness. Using supercomputers, we screened over 4 million potential drug molecules against each of two enzymes that are critical to the survival of Mycobacterium tuberculosis. During this process, we determined the top candidate molecules to test in the wet lab.
Volume 5, Issue 1 (August 2014), pp. 1–1
A brief introduction to this issue of the Journal of Computational Science Education from the editor.
Volume 5, Issue 1 (August 2014), pp. 2–9
https://doi.org/10.22369/issn.2153-4136/5/1/1A computational module has been developed in which students examine the binding interactions between indinavir and HIV-1 protease. The project is a component of the Medicinal Chemistry course offered to upper level chemistry, biochemistry, and biology majors. Students work with modeling and informatics tools utilized in drug development research while evaluating wild-type and mutated forms of the HIV-1 protease in complex with the inhibitor indinavir. By quantifying the molecular interactions within protease-inhibitor complexes, students can characterize the structural basis for reduced efficacy of indinavir.
Volume 5, Issue 1 (August 2014), pp. 10–22
https://doi.org/10.22369/issn.2153-4136/5/1/2In this paper, we present GalaxSeeHPC, a new cluster-enabled gravitational N-Body program designed for educational use, along with two potential student experiences that illustrate what students might be able to investigate at larger N than available with earlier versions of GalaxSee. GalaxSeeHPC adds additional force calculation algorithms and input options to the previous clusterenabled version. GalaxSeeHPC lessons have been developed focusing on two key studies, the structure of rotating galaxies and the large scale structure of the universe. At large N, visualizing the results becomes a significant challenge, and tools for visualization are presented. The canonical lesson in the original version of GalaxSee is the rotation and flattening of a cluster with angular momentum. Model discrepancies that are not obvious at the range of N available in previous versions become quite obvious at large N, and changes to the initial mass and velocity distribution can be seen more readily. For the large scale structure models, while basic clearing and clustering can be seen at around N=5,000, N=50,000 allows for a much clearer visualization of the filamentary structure at large scale, and N=500,000 allows for a more detailed geometry of the knots formed as the filaments combine to form superclusters. For the galactic dynamics simulations, we found that while a flattening due to overall angular momentum can be explored with N=1,000 or smaller, formation of spiral structure requires not only a larger number of objects, typically on the order of 10,000, but also modifications to the default initial mass and velocity distributions used in older versions of GalaxSee.
Volume 5, Issue 1 (August 2014), pp. 23–27
https://doi.org/10.22369/issn.2153-4136/5/1/3A typical upper level undergraduate or first year graduate level regression course syllabus treats model selection with various stepwise regression methods. Here we implement evolutionary computing for subset model selection and accomplish two goals: i) introduce students to the powerful optimization method of genetic algorithms, and ii) transform a regression analysis course to a regression and modeling without requiring any additional time or software commitment.Furthermore we also employed Akaike Information Criterion (AIC) as a measure of model fitness instead of another commonly used measure of R-square. The model selection tool uses Excel which makes the procedure accessible to a very wide spectrum of interdisciplinary students with no specialized software requirement. An Excel macro, to be used as an instructional tool is freely available through the author's website.
Volume 5, Issue 1 (August 2014), pp. 28–43
https://doi.org/10.22369/issn.2153-4136/5/1/4When learning to program, students are typically exposed to either a visual or command line environment. Visual environments are usually adopted to help engage students with programming due to their user-friendly feature capabilities. This article explores the effect of using visual environments such as Integrated Development Environments and syntax-free tools to teach students how to program. Prior studies have shown that some visual environments can have a productive impact on a student's ability to learn and become engaged with programming. However, the functional behavior of visual environments may cause a student to develop a faulty mental model for programming. One possible reason is due to the fixed set of skills that a student acquires upon initial exposure to programming while using a visual environment. Two systematic studies were conducted for exposing students to programming in introductory courses using both visual and command line environments. From the first study, it was found that visual environments can initially impose a lower learning curve for students. However, the second study revealed that visual environments may present a challenge for students to directly transfer their acquired skills to other programming environments after initial exposure.
Volume 4, Issue 1 (October 2013), pp. 1–1
A brief introduction to this issue of the Journal of Computational Science Education from the editor.
Volume 4, Issue 1 (October 2013), pp. 2–10
https://doi.org/10.22369/issn.2153-4136/4/1/1In this paper, we present a computational approach to teaching general education courses that expose students to science and computing principles in engaging contexts, including modeling and simulation, games, and history. The courses use scalable curriculum modules organized in layers of increasing difficulties in order to balance learning challenges and student abilities. We describe the computational pedagogy followed in these modules and courses, with particular attention to the simulation-based course, namely introduction to computational science, to present a case study for those considering similar initiatives.
Volume 4, Issue 1 (October 2013), pp. 11–15
https://doi.org/10.22369/issn.2153-4136/4/1/2The Blue Waters Undergraduate Petascale Education Program (NSF) sponsors the development of educational modules that help students understand computational science and the importance of high performance computing. As part of this materials development initiative, we developed two modules, "Time after Time: Age- and Stage-Structured Models" and "Probable Cause: Modeling with Markov Chains," which develop application problems involving transition matrices and provide accompanying programs in a variety of systems (C/MPI, C, MATLAB, Mathematica). Age- and stage-structured models incorporate the probability of an animal passing from one age or stage to the next as well as the animal's average reproduction at each age or stage. Markov chain models are based on the probability of passing from one state to another. These educational materials follow naturally from another Blue Waters module, "Living Links: Applications of Matrix Operations to Population Studies," which provides a foundation for the use of matrix operations. This paper describes the two modules and details experiences using the resources in classes.
Volume 4, Issue 1 (October 2013), pp. 16–23
https://doi.org/10.22369/issn.2153-4136/4/1/3This paper explores the landscape of computing educational resources found on the web together with teaching and learning materials that can facilitate the integration of computational thinking into the classroom. In specific, this paper focuses in finding and describing existing learning environments that integrate computational thinking into a STEM discipline This study provides initial steps towards that goal of providing a comprehensive list of STEM-based computational resources on the web that also provides guiding information, which can help teachers and parents make decisions to evaluate and integrate these resources easily for educational purposes.
Volume 4, Issue 1 (October 2013), pp. 24–29
https://doi.org/10.22369/issn.2153-4136/4/1/4Undergraduate teaching that focuses on student-driven research, mentored by research active faculty, can have a powerful effect in bringing relevance and cohesiveness to a department's programs. We describe and discuss such a program in computational mathematics, and the effects this program has had on the students, the faculty, the department and the university.
Volume 4, Issue 1 (October 2013), pp. 30–34
https://doi.org/10.22369/issn.2153-4136/4/1/5Massively Parallel Monte Carlo, an in-house computer code available at http://code.google.com/p/mpmc/, has been successfully utilized to simulate interactions between gas phase sorbates and various metal-organic materials. In this regard, calculations involving polarizability were found to be critical, and computationally expensive. Although GPGPU routines have increased the speed of these calculations immensely, in its original state, the program was only able to leverage a GPUs power on small systems. In order to study larger and evermore complex systems, the program model was modified such that limitations related to system size were relaxed while performance was either increased or maintained. In this project, parallel programming techniques learned from the Blue Waters Undergraduate Petascale Education Program were employed to increase the efficiency and expand the utility of this code.
Volume 4, Issue 1 (October 2013), pp. 35–39
https://doi.org/10.22369/issn.2153-4136/4/1/6As part of a parallel computing course where undergraduate students learned parallel computing techniques and got to run their programs on a supercomputer, one student designed and implemented a sequential algorithm and two versions of a parallel algorithm to solve the knapsack problem. Performance tests of the programs were conducted on the Ranger supercomputer. The performance of the sequential and parallel implementations was compared to determine speedup and efficiency. We observed 82%-86% efficiency for the MPI version and 89% efficiency for the OpenMP version for sufficiently large inputs to the problem. Additionally, we discuss both the student and faculty member's reflections about the experience.
Volume 3, Issue 2 (December 2012), pp. 1–1
A brief introduction to this issue of the Journal of Computational Science Education from the editor.
Volume 3, Issue 2 (December 2012), pp. 2–10
https://doi.org/10.22369/issn.2153-4136/3/2/1Educators from across the educational spectrum are faced with challenges in delivering curricula that address sustainability issues. This article introduces a cyber-based interactive e-learning platform, entitled the Sustainable Product Development Collaboratory, which is focused on addressing this need. This collaboratory aims to educate a wide spectrum of learners in the concepts of sustainable design and manufacturing by demonstrating the effects of product design on supply chain costs and environmental impacts. In this paper, we discuss the overall conceptual framework of this collaboratory along with pedagogical and instructional methodologies related to the collaboratory-based sustainable design education. Finally, a sample learning module is presented along with methods for assessment of student learning and experiences with the collaborator.
Volume 3, Issue 2 (December 2012), pp. 11–17
https://doi.org/10.22369/issn.2153-4136/3/2/2Cyber Physical Systems (CPS) are the conjoining of an entities' physical and computational elements. The development of a typical CPS system follows a sequence from conceptual modeling, testing in simulated (virtual) worlds, testing in controlled (possibly laboratory) environments and finally deployment. Throughout each (repeatable) stage, the behavior of the physical entities, the sensing and situation assessment, and the computation and control options have to be understood and carefully represented through abstraction. The CPS Group at the Ohio State University, as part of an NSF funded CPS project on "Autonomous Driving in Mixed Environments", has been developing CPS related educational activities at the K-12, undergraduate and graduate levels. The aim of these educational activities is to train students in the principles and design issues in CPS and to broaden the participation in science and engineering. The project team has a strong commitment to impact STEM education across the entire K-20 community. In this paper, we focus on the K-12 community and present a two-week Summer Program for high school juniors and se- niors that introduces them to the principles of CPS design and walks them through several of the design steps. We also provide an online repository that aids CPS researchers in providing a similar educational experience.
Volume 3, Issue 2 (December 2012), pp. 18–25
https://doi.org/10.22369/issn.2153-4136/3/2/3The ever-increasing amount of computational power available has made it possible to use docking programs to screen large numbers of compounds to search for molecules that inhibit proteins. This technique can be used not only by pharmaceutical companies with large research and development budgets and large research universities, but also at small liberal arts colleges with no special computing equipment beyond the desktop PCs in any campus' computer laboratory. However, despite the availability of significant quantities of compute time available to small colleges to conduct these virtual screens, such as supercomputing time available through grants, we are unaware of any small colleges that do this. We describe the experiences of an interdisciplinary research collaboration between faculty in the Chemistry and Computer Science Departments in a chemistry course where chemistry and biology students were shown how to conduct virtual screens. This project began when the authors, who had been collaborating on drug discovery research using virtual screening, decided that the virtual screening process they were using in their research could be adapted to fit in a couple of lab periods and would complement one of the instructors' courses on medicinal chemistry. The resulting labs would introduce students to the virtual screening portion of the drug discovery process.
Volume 3, Issue 2 (December 2012), pp. 26–33
https://doi.org/10.22369/issn.2153-4136/3/2/4Complex scientific codes and the datasets they generate are in need of a sophisticated categorization environment that allows the community to store, search, and enhance metadata in an open, dynamic system. Currently, data is often presented in a read-only format, distilled and curated by a select group of researchers. We envision a more open and dynamic system, where authors can publish their data in a writeable format, allowing users to annotate the datasets with their own comments and data. This would enable the scientific community to collaborate on a higher level than before, where researchers could for example annotate a published dataset with their citations. Such a system would require a complete set of permissions to ensure that any individual's data cannot be altered by others unless they specifically allow it. For this reason datasets and codes are generally presented read-only, to protect the author's data; however, this also prevents the type of social revolutions that the private sector has seen with Facebook and Twitter. In this paper, we present an alternative method of publishing codes and datasets, based on Fluidinfo, which is an openly writeable and social metadata engine. We will use the specific example of the Einstein Toolkit, a part of the Cactus Framework, to illustrate how the code's metadata may be published in writeable form via Fluidinfo.
Volume 3, Issue 2 (December 2012), pp. 34–40
https://doi.org/10.22369/issn.2153-4136/3/2/5An ab initio density functional theory based method that has a long history of dealing with large complex systems is the Orthogonalized Linear Combination of Atomic Orbitals (OLCAO) method, but it does not operate in parallel and, while the program is empirically observed to be fast, many components of its source code have not been analyzed for efficiency. This paper describes the beginnings of a concerted effort to modernize, parallelize, and functionally extend the OLCAO program so that it can be better applied to the complex and challenging problems of materials design. Specifically, profiling data were collected and analyzed using the popular performance monitoring tools TAU and PAPI as well as standard UNIX time commands. Each of the major components of the program was studied so that parallel algorithms that either modified or replaced the serial algorithm could be suggested. The program was run for a collection of different input parameters to observe trends in compute time. Additionally, the algorithm for computing interatomic interaction integrals was restructured and its performance was measured. The results indicate that a fair degree of speed-up of even the serial version of the program could be achieved rather easily, but that implementation of a parallel version of the program will require more substantial consideration.
Volume 3, Issue 2 (December 2012), pp. 41–48
https://doi.org/10.22369/issn.2153-4136/3/2/6The R(m, n) instance of the party problem asks how many people must attend a party to guarantee that at the party, there is a group of m people who all know each other or a group of n people who are all complete strangers. GPUs have been shown to significantly decrease the running time of some mathematical and scientific applications that have embarrassingly parallel portions. A brute force algorithm to solve the R(5, 5) instance of the party problem can be parallelized to run on a number of processing cores many orders of magnitude greater than the number of cores in the fastest supercomputer today. Therefore, we believed that this currently unsolved problem is so computationally intensive that GPUs could significantly reduce the time needed to solve it. In this work, we compare the running time of a naive algorithm to help make progress solving the R(5, 5) instance of the party problem on a CPU and on five different GPUs ranging from low-end consumer GPUs to a high-end GPU. Using just the GPUs computational capabilities, we observed speedups ranging from 1.9 to over 21 in comparison to our quad-core CPU system.
Volume 3, Issue 1 (June 2012), pp. 1–1
A brief introduction to this special CI-TEAM issue of the Journal of Computational Science Education from the editor.
Volume 3, Issue 1 (June 2012), pp. 2–10
https://doi.org/10.22369/issn.2153-4136/3/1/1This paper describes the application of findings from the National Science Foundation's project on Computational Thinking (CT) in America's Workplace to program assessment. It presents the process used to define the primary job functions and work tasks of CT-Enabled STEM professionals in today's scientific enterprise. Authors describe three programs developing CT skills among learners in secondary and post secondary programs and how the resulting occupational analysis was used to review these programs. The article presents ways this analysis can be used as a framework to guide the development of STEM learning outcomes and activities, and sets of directions for future work.
Volume 3, Issue 1 (June 2012), pp. 11–18
https://doi.org/10.22369/issn.2153-4136/3/1/2Shodor, a national resource for computational science education, has successfully developed a model for middle and high school students to gain authentic and appropriate experiences in computational science. As we prepare students for the 21st century workforce, three of the most important skills for advancing modern mathematics and science are quantitative reasoning, computational thinking, and multi-scale modeling. Shodor's Computing MATTERS: Pathways to Cyberinfrastructure program, funded in part by the National Science Foundation Cyberinfrastructure Training, Education, Advancement, and Mentoring (CI-TEAM) program, provides opportunities for middle and high school students to explore all three of these areas. One of the wide range of programs offered through Computing MATTERS is the SUCCEED Apprenticeship Program. The overall goal of the SUCCEED Apprenticeship Program is to provide students with authentic and appropriate experiences in the use of technologies, techniques and tools of Information Technology (IT) with a particular focus on computational science and to produce evidence that students become proficient in these IT technologies, techniques and skills. The program combines appropriate structure (classroom-style training and project-based work experience) with meaningful work content, giving students a wide variety of technical and communication skills. The program uses innovative approaches to get students excited about computational science and enables students to grow from excitement to expertise in science, technology, engineering, and mathematics (STEM). Since its beginning in 2005, the SUCCEED Apprenticeship Program has proven to be a successful model for enabling middle and high school students of both genders and of ethnically and economically diverse backgrounds to gain proficiency in STEM while learning, experiencing, and using information technologies.
Volume 3, Issue 1 (June 2012), pp. 19–27
https://doi.org/10.22369/issn.2153-4136/3/1/3W3C standardized Web Services are becoming an increasingly popular middleware technology used to facilitate the open exchange of data and perform distributed computation. In this paper we propose a modern alternative to commonly used software applications such as STANJAN and NASA CEA for performing chemical equilibrium analysis in a platform-independent manner in combustion, heat transfer, and fluid dynamics research. Our approach is based on the next generation style of computational software development that relies on loosely-coupled network accessible software components called Web Services. While several projects in existence use Web Services to wrap existing commercial and open-source tools to mine thermodynamic data, no Web Service infrastructure has yet been developed to provide the thermal science community with a collection of publicly accessible remote functions for performing complex computations involving reacting flows. This work represents the first effort to provide such an infrastructure where we have developed a remotely accessible software service that allows developers of thermodynamics and combustion software to perform complex, multiphase chemical equilibrium computation with relative ease. Coupled with the data service that we have already built, we show how the use of this service can be integrated into any numerical application and invoked within commonly used commercial applications such as Microsoft Excel and MATLAB® for use in computational work. A rich internet application (RIA) is presented in this work to demonstrate some of the features of these newly created Web Services.
Volume 3, Issue 1 (June 2012), pp. 28–33
https://doi.org/10.22369/issn.2153-4136/3/1/4We present a cloud-enabled comprehensive platform (Pop!World) for experiential learning, education, training and research in population genetics and evolutionary biology. The major goal of Pop!World is to leverage the advances in cyber-infrastructure to improve accessibility of important biological concepts to students at all levels. It is designed to empower a broad spectrum of users with access to cyber-enabled scientific resources, tools and platforms, thus, preparing the next generation of scientists. Pop!World offers a highly engaging alternative to currently prevalent textual environments that fail to captivate net-generation audiences. It is also more mathematically focused than currently available tools, allowing it to be used as a basic teaching tool and expanded to higher education levels and collaborative research platforms. The project is a synergistic inter-disciplinary collaboration among investigators from Computer Science & Engineering and Biological Sciences. In this paper we share our invaluable multi-disciplinary experience (CSE and BIO) in the design and deployment of the Pop!World platform and its successful integration into the introductory biological sciences course offerings over the past two years. We expect our project to serve as a model for creative use of advances in cyber-infrastructure for engaging the cyber-savvy net-generation [11] students and invigorating STEM education.
Volume 3, Issue 1 (June 2012), pp. 34–46
https://doi.org/10.22369/issn.2153-4136/3/1/5This project involves both the development of a community of scholars committed to cross-institution, interdisciplinary and cross-linguistic collaboration (a Virtual Center for Language Acquisition, VCLA) and the creation of a web-based infrastructure through which a new generation of scholars can learn concepts and technologies empowered through this CI environment. These technologies, constituting a Virtual Linguistic Lab (VLL), provide the student with the structure for data creation, data management and data analysis as well as the tools for collaborative data sharing. This infrastructure, informed and executed through computational science, involves the coherent integration of an open web-based gateway (The VCLA website), linked to a specialized web-based VLL portal which includes not only real world examples and visualizations of data creation and analyses, but several cybertools by which these data can be managed and analyzed. This infrastructure subserves both the beginning student and the researcher pursuing calibrated methods and structured data sharing for collaborative purposes. Students continually engage in the development of the cybertools involved and in the scientific method involved in primary research. In this paper we summarize our objectives, the challenges we face and the solutions we have developed to these challenges. At this point, the project is just completing an implementation stage and is being readied to move to a diffusion stage.
Volume 3, Issue 1 (June 2012), pp. 47–56
https://doi.org/10.22369/issn.2153-4136/3/1/6Many contemporary scientific endeavors now rely on the collaborative efforts of researchers across multiple institutions. As a result of this increase in the scale of scientific collaboration, sharing and reuse of data using private and public repositories has increased. At the same time, data sharing practices and capabilities appear to vary widely across disciplines and even within some disciplines. This research sought to develop an understanding of this variation through the lens of theories that account for individual choices within institutional contexts. We conducted a total of 25 individual semi-structured interviews to understand researchers' current data sharing practices. The main focus of our interviews was: (1) to explore domain specific data sharing practices in diverse disciplines, and (2) to investigate the factors motivating and preventing the researchers' current data sharing practices. Results showed support for an institutional perspective on data sharing as well as a need for better understanding of scientists' altruistic motives for participating in data sharing and reuse.
Volume 3, Issue 1 (June 2012), pp. 57–65
https://doi.org/10.22369/issn.2153-4136/3/1/7The emergence of transformative technological advances in science and engineering practice has necessitated the integration of these advances in engineering classrooms. In this paper, we present the design and implementation of a virtual reality game system that infuses cyberinfrastructure (CI) learning experiences into the Project-Lead-The-Way (PLTW) pre-engineering classrooms to promote metacognition for science and engineering design in context. The CI features, metacognitive strategies, context-oriented approaches as well as their seamless integration in the game system are elaborated in detail through two game modules, Power Ville and Stability. Both games involve students in the process of decision-making that contributes to different aspects of city infrastructures (energy and transportation). The evaluation of Power Ville deployment in a PLTW classroom is also presented. The preliminary assessment confirms the usability of CI and metacognitive tools in science and engineering design.
Volume 2, Issue 1 (December 2011), pp. 1–8
https://doi.org/10.22369/issn.2153-4136/2/1/1In this paper, we model the growth of virus in an infected person, taking into account the effect of antibiotics and immunity of the person. We use discrete dynamical systems or difference equations to model the situation; and Excel to obtain the numerical solutions and visualize the solution using graphing capabilities of Excel.
Volume 2, Issue 1 (December 2011), pp. 9–14
https://doi.org/10.22369/issn.2153-4136/2/1/2Establishing consistent use of computer models and simulations in K-12 classrooms has been a challenge for the computational science education community. Scaling successful local efforts has been particularly difficult. In this article we describe how a training model from one place and time can be translated into a training model for another very different place and time if critical factors such as school system culture, professional development organization, local learning standards and goals, and collaboration between STEM disciplines are taken into account.
Volume 2, Issue 1 (December 2011), pp. 15–20
https://doi.org/10.22369/issn.2153-4136/2/1/3For the Blue Waters Undergraduate Petascale Education Program (NSF), we developed a computational science module, "Living Links: Applications of Matrix Operations to Population Studies," which introduces matrix operations using applications to population studies and provides accompanying programs in a variety of systems (C/MPI, MATLAB, Mathematica). The module provides a foundation for the use of matrix operations that are essential to modeling numerous computational science applications from population studies to social networks. This paper describes the module; details experiences using the material in two undergraduate courses (High Performance Computing and Linear Algebra) in 2010 and 2011 at Wofford College and two workshops for Ph.D. students at Monash University in Melbourne, Australia, in 2011; and describes refinements to the module based on suggestions in student and instructor evaluations.
Volume 2, Issue 1 (December 2011), pp. 21–27
https://doi.org/10.22369/issn.2153-4136/2/1/4CitcomS, a finite element code that models convection in the Earth's mantle, is used by many computational geophysicists to study the Earth's interior. In order to allow faster experiments and greater simulation capability, there is a push to increase the performance of the code to allow more computations to complete in the same amount of time. To accomplish this we leverage the massively parallel capabilities of graphics processors (GPUs), specifically those using NVIDIA's CUDA framework. We translated existing functions to run in parallel on the GPU, starting with the functions where the most computing time is spent. Running on NVIDIA Tesla GPUs, initial results show an average speedup of 1.8 that stays constant with increasing problem sizes and scales with increasing numbers of MPI processes. As more of the CitcomS code is successfully translated to CUDA, and as newer general purpose GPU frameworks like Fermi are released, we should continue to see further speedups in the future.
Volume 2, Issue 1 (December 2011), pp. 28–34
https://doi.org/10.22369/issn.2153-4136/2/1/5The Human Immunodeficiency Virus type 1 protease (HIV-1 PR) performs a vital role in the lifecycle of the virus, specifically in the maturation of new viral particles. Therefore, delaying the onset of AIDS, the primary goal of HIV treatment, can be achieved by inhibiting this protease.[1] However, the rapidly mutating virus quickly develops drug resistance to current inhibitors, thus novel protease inhibitors are needed. Here, 100ns molecular dynamics (MD) simulations were done for the wild type and two mutant proteases to gain insight into the mechanisms by which the mutations confer drug resistance. Several different metrics were used to search for differences between the wild type and mutant proteases including: flap tip distance and root-mean-square deviation (RMSD), mutual information, and Kullback-Leibler divergence. Finally, it is found at the 100ns timescale there are not large differences in the structure, flexibility and motions of the wild type protease relative to the mutants, and longer simulations may be needed to identify how the structural changes imparted by the mutations affect the protease's functionality.
Volume 1, Issue 1 (December 2010), pp. 1–1
It is with great pleasure that we release the first issue of the Journal of Computational Science Education. The journal is intended as an outlet for those teaching or learning computational science to share their best practices and experiences with the community. Included are examples of programs and exercises that have been used effectively in the classroom to teach computational science concepts and practices, assessments of the impact of computational science education on learning outcomes in science and engineering fields, and the experiences of students who have completed significant computational science projects. With a peer-reviewed journal, we hope to provide a compendium of the best practices in computational science education along with links to shareable educational materials and assessments.
Volume 1, Issue 1 (December 2010), pp. 2–7
https://doi.org/10.22369/issn.2153-4136/1/1/1This paper presents a new mathematics elective for an undergraduate Computational Science program. Algebraic Geometry is a theoretical area of mathematics with a long history, often highlighted by extreme abstraction and difficulty. This changed in the 1960s when Bruno Buchberger created an algorithm that allowed Algebraic Geometers to compute examples for many of their theoretical results and gave birth to a subfield called Computational Algebraic Geometry (CAG). Moreover, it introduced many rich applications to biology, chemistry, economics, robotics, recreational mathematics, etc. Computational Algebraic Geometry is usually taught at the graduate or advanced undergraduate level. However, with a bit of work, it can be an extremely valuable course to anyone with decent algebra skills. This manuscript describes Math 380: Computational Algebraic Geometry and shows the usefulness of the class as an elective to a Computational Science program. In addition, a module that gives students a high-level introduction to this valuable computational method was constructed for our Introductory Computational Science course.
Volume 1, Issue 1 (December 2010), pp. 8–12
https://doi.org/10.22369/issn.2153-4136/1/1/2With computers gaining more powerful processors, computational modeling can be introduced gradually to secondary students allowing them to visualize complex topics and gather data in the different scientific fields. In this study, students from four rural high schools used computational tools to investigate attributes of the ingredients that might cause fluorescence in energy drinks. In the activity, students used the computational tools of WebMO to model several ingredients in energy drinks and gather data on them, such as molecular geometry and ultraviolet-visible absorption spectra (UV-Vis spectra). Using the data they collected, students analyzed and compared their ingredient molecules and then compared them to molecules that are known to fluoresce to determine any patterns. After students participated in this activity, data from testing suggest they were more aware of fluorescence, but not more aware of how to read an UV-Vis spectrum.
Volume 1, Issue 1 (December 2010), pp. 13–27
https://doi.org/10.22369/issn.2153-4136/1/1/3In the Indiana University system, as well as many other schools, finite mathematics is a prerequisite for most majors, especially business, public administration, social sciences, and some life science areas. Statisticians Moore, Peck, and Rossman (2002) articulate a set of goals for mathematics prerequisites: including instilling an appreciation of the power of technology and developing skills necessary to use appropriate technology to solve problems, developing understanding, and exploring concepts. The paper describes the use of Excel spreadsheets in the teaching and learning of finite mathematics concepts in the linked courses Mathematics in Action: Social and Industrial Problems and Introduction to Computing taught for business, liberal arts, science, nursing, education, and public administration students. The goal of the linked courses is to encourage an appreciation of mathematics and promote writing as students see an immediate use for it in completing actual real-world projects. The courses emphasize learning and writing about mathematics and the practice of computer technology applications through completion of actual industrial group projects. Through demonstration of mathematical concepts using Excel spreadsheet, we stress synergies between mathematics, technology, and real-world applications. These synergies emphasize the learning goals such as quantitative skill development, analytical and critical thinking, information technology and technological issues, innovative and creative reasoning, and writing across the curriculum.
Volume 1, Issue 1 (December 2010), pp. 28–32
https://doi.org/10.22369/issn.2153-4136/1/1/4In this paper we describe an ongoing project where the goal is to develop competence and confidence among chemistry faculty so they are able to utilize computational chemistry as an effective teaching tool. Advances in hardware and software have made research-grade tools readily available to the academic community. Training is required so that faculty can take full advantage of this technology, begin to transform the educational landscape, and attract more students to the study of science.
Volume 1, Issue 1 (December 2010), pp. 33–37
https://doi.org/10.22369/issn.2153-4136/1/1/5For the Blue Waters Undergraduate Petascale Education Program (NSF), we developed two computational science modules, "Biofilms: United They Stand, Divided They Colonize" and "Getting the 'Edge' on the Next Flu Pandemic: We Should'a 'Node' Better." This paper describes the modules and details our experiences using them in three courses during the 2009-2010 academic year at Wofford College. These courses, from three programs, included students from several majors: biology, chemistry, computer science, mathematics, physics, and undecided. Each course was evaluated by the students and instructors, and many of their suggestions have already been incorporated into the modules.
Volume 1, Issue 1 (December 2010), pp. 38–43
https://doi.org/10.22369/issn.2153-4136/1/1/6The N-Body problem has become an intricate part of the computational sciences, and there has been rise to many methods to solve and approximate the problem. The solution potentially requires on the order of calculations each time step, therefore efficient performance of these N-Body algorithms is very significant [5]. This work describes the parallelization and optimization of the Particle-Particle, Particle-Mesh (P3M) algorithm within GalaxSeeHPC, an open-source N-Body Simulation code. Upon successful profiling, MPI (Message Passing Interface) routines were implemented into the population of the density grid in the P3M method in GalaxSeeHPC. Each problem size recorded different results, and for a problem set dealing with 10,000 celestial bodies, speedups up to 10x were achieved. However, in accordance to Amdahl's Law, maximum speedups for the code should have been closer to 16x. In order to achieve maximum optimization, additional research is needed and parallelization of the Fourier Transform routines could prove to be rewarding. In conclusion, the GalaxSeeHPC Simulation was successfully parallelized and obtained very respectable results, while further optimization remains possible.
Volume 1, Issue 1 (December 2010), pp. 44–50
https://doi.org/10.22369/issn.2153-4136/1/1/7High performance computing raises the bar for benchmarking. Existing benchmarking applications such as Linpack measure raw power of a computer in one dimension, but in the myriad architectures of high performance cluster computing an algorithm may show excellent performance on one cluster while on another cluster of the same benchmark it performs poorly. For a year a group of Earlham student researchers worked through the Undergraduate Petascale Education Program (UPEP) on an improved, multidimensional benchmarking technique that would more precisely capture the appropriateness of a cluster resource to a given algorithm. We planned to measure cluster effectiveness according to the thirteen dwarfs of computing as published in Berkeley's parallel computing research paper. To accomplish this we created PetaKit, a software stack for building and running programs on cluster computers.