Click on a project to read its description.
Arts NC State is the collective organization for six of NC State University’s performing and visual arts programs including The Crafts Center, Department of Performing Arts & Technology (Dance & Music), Gregg Museum of Art & Design, NC State LIVE performing artist series, and University Theatre. Arts NC State provides development, marketing, outreach & engagement and ticketing support to these units and serves all NC State Campus.
The Curricular Connections Guide is one of the key programs Arts NC State (ANCS) offers via the outreach and engagement office, and links NC State faculty to arts programs that thematically connect with their courses. It has evolved from an undesigned pdf/paper guide to a designed digital and paper guide and then became online-only during the pandemic.
Each semester, the manager of outreach and engagement, Amy Sawyers-Williams, and her team of student interns analyze the thematic content of the upcoming arts programs offered by the 6 programs ANCS serves. They then go through the course catalog by department and copy/paste relevant class info into a spreadsheet (for example, here is the spring 2023 spreadsheet). For example, if University Theatre was going to produce Hamilton, we would make a list of thematic connections like History/American Revolution, war, political theory, psychology, dance etc. We would then search for the course in the catalog that connected to these themes.
Once the spreadsheet is complete and all class connections have been made, we reach out to the faculty teaching these courses to let them know and to encourage them to engage with the art by either offering extra credit for the students to see the show, having a guest artist visit their class etc.
The problem is that it is time consuming to manually review the course catalog for thematic connections to our programs, and on top of that, it is subjective based on the experience of the person researching and their knowledge of both the program and class. We are definitely missing class connections that a computer program could possibly detect.
We want a software system to facilitate making the connections between written artistic themes and courses in the course catalog. For example, what if faculty could go to a website and type in the class they teach or a concept like “feminism” and then receive a list of the programs that may connect to it?
This app should provide a way for administrators to manage shows and their connections to courses in the NC State course catalog.
There are a few examples of websites that do this or something similar for other organizations:
Registration and Records is unable to provide programmatic access to the course catalog. Instead, the system should provide an administrative interface that will allow an admin to upload and maintain a catalog file requested from Registration and Records each semester with up-to-date course information. With this file, the app will be able to identify courses that have been added, deleted, or modified. Students working on this project will have access to this file for the Fall 2023 course catalog. There should also be an interface to curate these records and establish/modify the associations between courses and shows that will ultimately be displayed on the public-facing side of the website.
We would like this to ultimately be a tool on a webpage that could be accessible via desktop or mobile device.
Because this tool will be hosted in–and supported by–the University, students should confirm with us that any technology choices are compatible with what the University can support. This is typically either a project built on the LAMP stack (PHP, MySQL) or the creation of custom Wordpress plugins to augment the public display of information as well as the administrative side of Wordpress with necessary features to manage courses, shows, and relationships between these.
We are also hoping that the students can design something that Arts NC State staff can be trained on, so that it is sustainable for the future. For example, we would like to know how to load in the shows, the course catalog data, and the key words that would be pulled.
This is flexible and we are open to all ideas!
Deutsche Bank is a truly global universal Bank operating for 150 years on multiple continents across all major business lines and being a cornerstone of the global and local economies. We are leaders and innovators, who are driving the financial industry forward increasingly with help of technology – from cloud to AI.
The recent developments in the Generative AI field, specifically Large Language Models (LLM), promise productivity and efficiency gains on a scale. Those models, when trained on the enterprise-specific content, could be very useful to capture tacit organizational knowledge to help employees to be more effective – from the newcomers onboarding to accelerating task execution by the experienced operators.
This project will explore ways to implement enterprise-contextualized knowledge discovery mechanisms through LLMs trained on domain-specific data. The project scope would include several elements including, but not limited to:
We could be technology agnostic, at least to an extent, though our preference would be to use Google Cloud Platform and AI functionality it offers, such as Generative AI Studio.
Human-centric software engineering lab in the CS department at NCSU is directed by Dr. Sandeep Kuttal. The lab focuses on the human aspects of software engineering by studying and modeling programmer behavior and then designing and developing mixed-initiative programmer-computer systems. The lab's pursuit of excellence is underscored by its multidisciplinary approach, amalgamating the domains of Human-Computer Interaction, Software Engineering, and Artificial Intelligence. By synergizing these diverse fields of expertise, the lab pioneers the development of novel strategies, innovative theories, immersive visualizations, and tangible prototypes, all tailored to cater to the unique needs of programmers.
Pair programming, a well-established practice in software development, involves two programmers collaborating at a single workstation. This technique has gained traction due to its ability to enhance productivity, code quality, and self-efficacy among programmers. The shared responsibility and continuous feedback lead to more robust solutions and improved learning. However, recent research has illuminated significant differences in the dynamics and outcomes of pair programming when considering gender as a crucial factor.
In the realm of pair programming, the impact of gender dynamics has emerged as a notable concern. Research has unveiled intriguing variations in collaboration approaches, communication styles, leadership roles, interruptions, and partner preferences based on gender. As these distinctions can influence the effectiveness of pair programming and potentially lead to unequal participation, it is imperative to support these gender-related nuances comprehensively.
To lay the groundwork, Dr. Kuttal and her team conducted a comprehensive study, including literature reviews, lab experiments, surveys, and interviews, to comprehend differences in same and mixed gender pairs. Their research illuminated potential challenges in remote pair programming and identified key aspects: (1) different communication cues for men and women, (2) diverse collaboration purposes, (3) distinct leadership styles in different gender pairings, (4) varying interruption patterns, and (5) gender preferences for partners. This effort resulted in a proof-of-concept implementation with some basic capabilities related to tracking interactions in pair programming sessions, but it's not very user-friendly for programmers.
The "Fostering Inclusive Pair Programming with Awareness Tool" project aims to develop powerful software to address gender-based dynamics in pair programming. This tool will transform pair programming by analyzing communication styles, leadership roles, interruptions, and partner preferences. The goal is to build understanding and empathy between pairs, enhancing collaboration, code quality, and productivity, regardless of gender composition.
This project's main goal is to recreate this system from scratch using the current implementation at https://github.com/Farissoliman/PairProgrammingTool as reference. The main focus is on improving the system's usability and functionality. Currently, the system carefully monitors how individuals, both in same-gender and mixed-gender pairs, collaborate during pair programming, roles of individuals, observing communication patterns, leadership dynamics, and interruptions. The system also allows real-time data capture without disrupting the programming process. Additionally, an analytics engine processes the gathered data to create visualizations and insights, which is presented in a user-friendly format to promote better self-awareness and understanding of their collaboration patterns.
Hence, the system must:
The current system consists of two main components: a VS Code extension with a Python (Flask) backend. Students are invited to explore alternative implementations that better address this problem. Some familiarity with Web technologies (JavaScript/TypeScript, CSS, HTML) and with the VSCode Extensions API will be beneficial.
The North Carolina Department of Natural and Cultural Resources (DNCR) oversees the State’s resources for the arts, history, libraries, and nature. Our mission is to improve quality of life by creating opportunities to experience excellence in these areas throughout North Carolina.
The North Carolina Division of Parks and Recreation (“DPR” or the “Division”) administers a diverse system of State parks, natural areas, trails, lakes, natural and scenic rivers, and recreation areas. The Division also supports and assists other recreation providers by administering grant programs for park and trail projects, and by offering technical advice for park and trail planning and development.
DPR exists to inspire all our citizens and visitors through conservation, recreation, and education.
The Data and Application Management Program works to support the Division, sister agencies, and nonprofits in web-based applications for various needs: personnel activity, Divisional financial transactions, field staff operations, facilities/equipment/land assets, planning/development/construction project management, incidents, natural resources, etc. Using data from these web applications, we assist program managers with reporting and analytic needs.
We have sponsored previous SDC projects, so we understand the process and know how to help you complete this project in an efficient manner while learning about real-world software application development. Our team includes three NCSU CSC alumni, all of which have completed projects with the SDC. These three will be overseeing the project and working directly with you to fulfill your needs and facilitate the development process.
Our existing LAMP stack system (the “legacy” system) was developed over the course of 25+ years with ad-hoc application development in a “production only” environment (mainly using PHP and MariaDB languages) to meet immediate business operational needs of the field staff. Many of the legacy applications, including the Fuel/Vehicle/Equipment application, were written as single file, undocumented, procedural applications. This makes it difficult to read, maintain, and upgrade them. These applications need to be updated with modern design patterns and documentation.
DPR manages 43 state parks and many other natural areas across the state. For the state parks to function, we need division-owned vehicles and fuel, oil, and equipment to operate these vehicles. These assets must be accounted for to manage inventory, budget, and park needs. This is where the vehicle application comes in; it stores information for vehicles, their fuel use, and related equipment across all the Division. Currently, this legacy application is unstructured, outdated, complicated, and does not have the ability to link to other applications.
We have recently begun migrating many of these legacy applications to new versions following modern design principles and technologies, such as single-page application clients written in React and backed by a REST API. The legacy system and upgraded web-applications have been containerized using Docker to run in parallel in the AWS cloud.
Last semester, a Senior Design student team began working on a new Inventory application, which aims to maintain the functionality of the legacy system’s Fuel/Vehicle/Equipment application while centralizing and simplifying the inventory-related workflow processes of both our park staff and the budget office.
The application has been partially completed, allowing for full management of park owned “On-Road” vehicles, one of many equipment categories that fall within the scope of this Inventory application. Our team from last semester also provided us with a dynamic backend and database structure that is ready to be utilized when creating the remaining front-end pages. We are happy with what has been completed for us so far and are excited to continue working with Senior Design Center students to fulfill the remaining requirements.
This semester, in addition to the completion of the remaining front-end pages and any supporting API endpoints, Inventory will also need data reporting tools as well as preparation for its connections to our future Budget application. Park staff must be able to request equipment and motor fleet vehicles from the budget office, as well as record their vehicles’ mileage, monthly park fuel consumption, and motor fleet vehicle telemetrics.
Last semester, the new Inventory application was redesigned to fit a more modern, object-oriented framework that will allow for standardized control of user permissions, a more organized database structure, and more sophisticated connectivity between applications through our shared REST API backend container.
Parks and Recreation is in the process of implementing a new system that allows for continued use of the legacy applications as well as the establishment of a next generation system. The legacy system has been modified to work with the next generation system for continued use, until all applications can be reworked and migrated appropriately into the new system. Your completed Inventory application shall be seamlessly integrated into this multi-container system using Docker Compose.
Tools and assets are limited to what has been approved by the NC Division of Information Technology (NC-DIT). Most of the ‘usable’ constraints will be what is limited to NC-DPRs use through NC-DIT.
Our new modernized apps currently run on Docker. Each modernized application will be packaged into individual frontend containers that use a NodeJS base image and are written in React with Material UI as the UI framework. The backend consists of a MariaDB database container and a unified REST API backend container which will be used by all modernized applications. The unified REST API container uses PHP 8 and is built using the Slim Framework. All applications in the legacy system will continue to function as they are, all of them within a single PHP 7.4 container that runs its own Apache server.
For this project, students will improve the Inventory application client, which will run on its own container and uses React with Material UI. To support the functionality of this new application, students will also extend the existing REST API and database to add all required functionality.
The Senior Design Center (SDC) of the Computer Science (CSC) Department at NC State oversees CSC492—the Senior Design capstone course of the CSC undergraduate program at NC State. Senior Design is offered every semester with current enrollment approaching 200 students across several sections. Each section hosts a series of industry-sponsored projects, which are supervised by a faculty Technical Advisor. All sections, and their teams, are also overseen by the Director of the Center.
Senior Design is a large course offered over multiple sections with different teams working on unique projects. Some of the teaching team in the SDC work across all sections while others are dedicated to individual sections. To optimize fair grading despite these differences across teams and sections, the teaching team follows grading rubrics for the various graded assignments in the class.
The rubrics are currently maintained in Google Sheets templates that are manually adapted for each section every semester (adding appropriate student teams, dedicated tabs for each grader, etc.) and shared with that semester’s staff. When grading, these spreadsheets are filled with grades by hand. There are two main problems with this process: 1) it is tedious and error-prone to customize each grading sheet every semester since the number of teams and number of faculty members on a section affect several of the calculations in the spreadsheet, and 2) when grading, it is easy to edit cells that have formulas on them, or make changes that affect how automatic calculations are performed.
We also communicate these rubrics to students by posting them on a dedicated page on our website. When we update our rubrics, we have to update the website and separately the Google Sheets templates, creating unnecessary additional work.
For this project, your team will build a Web application that will facilitate the creation and use of rubrics. There will be 3 types of users: system administrators, instructors, and students. Administrators will be able to create and maintain rubrics. Administrators will also be able to manage semesters, semester sections, student rosters per section, instructors per section, and optionally, teams of students in a section. Administrators will then be able to create individual or team assignments, and assign a grading rubric to these assignments.
When instructors log into the system, they will be able to see all students and teams in their sections. Instructors can also see all assignments for their students and teams. When assignments are ready to be graded, instructors will be able to open its rubric and enter grades for rubric items. Instructors sometimes also like to add notes for rubric items they fill out. Note that some assignments need to be graded by multiple instructors. Instructors can only view the scores they gave, but administrators will be able to see and manage aggregation of scores from multiple instructors on the same assignment.
Students will also be able to log into the system to see their assignments (individual and team) and their final scores once these have been released by administrators.
Rubrics are the most interesting challenge on this project since they should be as flexible as possible. The user should be able to specify, for each rubric item, a name for the item, a weight, valid values (like a ranked list, letter grades, numeric scores in a range, etc.), and an optional brief description of what is expected for that item.
Rubrics can also have nested categories of items, where each category will have a designated weight in the overall rubric. For example, the rubric in our Interim Progress Report has a section for design with several different items in it.
Some elements on a rubric can become optional depending on other values in the rubric. For example, in our written documentation we have a rubric category for requirements with different subcategories for different ways requirements can be expressed. Only one of these subcategories is expected to be used for requirements, while some of the other elements are common across all types of requirements.
We also want to have the ability to add extra-credit items to a rubric, both as part of a category and as part of the main rubric.
This will be a Web application running on Docker containers. The backend will expose a REST API and will be written in PHP 8 with the Slim Framework. The frontend will be written in React. The database will be MySQL/MariaDB.
The Senior Design Center (SDC) of the Computer Science (CSC) Department at NC State oversees CSC492—the Senior Design capstone course of the CSC undergraduate program at NC State. Senior Design is offered every semester with current enrollment approaching 200 students across several sections. Each section hosts a series of industry-sponsored projects, which are supervised by a faculty Technical Advisor. All sections, and their teams, are also overseen by the Director of the Center.
Senior Design is a large course offered over multiple sections with different teams working on unique projects. Because this is a large operation, we have many policies and procedures that students should follow. Also, different sections and teams often have different deadlines for the same deliverables.
Information about policies, procedures, deadlines and other class administrative details is available to students on one or more of the syllabus, the course calendar, the course website, email, or our submission system. Given the number of possible places where information can be found, at this scale it is no surprise that the teaching team receives multiple similar questions from students throughout the semester.
This problem is not unique to Senior Design, and in fact is common across classes even when enrollment is moderate. To simplify and manage answering student questions, many courses, Senior Design included, often use communication tools such as Piazza, Slack, Discord, Ed, and others. However, especially with larger classes, it is common for students to ask the same question multiple times. It is just easier for students to ask than to comb through multiple resources for a precise answer.
This project involves the creation of a bot—SyllaBot—that can be installed into a course’s Slack Workspace or Discord Server to provide a “slash command” that students can use to ask questions about the administrative details of the class. The bot will leverage the OpenAI API and clever prompt engineering to provide an answer to a student’s question that takes into account who the student is and their role in the class. For example, the system should be able to determine who the student is, the section the student is enrolled in, the team the student belongs to, etc. in order to produce the most accurate answer.
On the administrative side, the system needs a way for an instructor to provide access to information needed to answer the questions. These sources should be configurable and flexible. Examples of data sources include:
The instructor should also be able to provide their own OpenAI API access key on a per-course basis. Rather than spinning up multiple instances of the app every semester or for every course, the system should support creating courses and installing the bot to different Slack and/or Discord spaces for different courses while ensuring that the course from which a question originates can be identified.
If time permits, the system should keep records of the questions asked, by whom, the responses provided, and other metrics. These can be displayed to the instructor on a queryable dashboard.
This will be a Web application running on Docker containers. You will have access to an OpenAI API key you can use for development. The preferred backend language is Node.js, and any frontend should be implemented on React.
Ankit Agarwal is the Founder & CEO of K2S and an NC State Computer Science alumnus. He envisions a platform which enables other alumni an easy way to give back to the student community by way of mentorship to active Computer Science students.
Sara Seltzer is the Director of Philanthropy for the Department of Computer Science at NC State and works with alumni (individuals, companies, and foundations) who want to impact students and faculty through financial support.
Together from an industry and a university perspective we’re trying to create a virtual engagement program for the NC State Community.
In the Department of Computer Science at NC State, we understand that navigating the program and curriculum can be challenging. Wouldn't it be great to have someone who has been there before support and guide you around the pitfalls, helping you reach your full potential? Much of the skills and knowledge you will experience will take place in classrooms and labs but it also happens in the individual connections made with peers and alumni. CSC's alumni network includes over 10,000 members, many that have been very successful in their careers and who are eager to give back and support current students.
Successful alumni often revisit the path that got them there, and it invariably leads them down to the roots of their alma mater. In recognition of their supporters and heroes along that path, they have the urge to become one themselves. A portal which allows alumni to easily provide mentorship and their lessons learned not only is fulfilling to the alumni as a way of giving back, it also provides real help and guidance to students stepping out from the shadows of campus.
We propose creating an online mentorship web portal that connects current CSC students with CSC alumni to share a goal of promoting academic success and professional advancement for all.
Primary Portal end-users include: CSC alumni looking to give back to their alma mater by mentoring students and current CSC undergraduate/graduate students looking for help on a specific topic or project. Secondary users could include alumni who are looking for speaking opportunities and current students searching for contacts for specific internships and co-ops.
Required Features (Minimum Viable Product)
Nice to Haves
Examples of similar solutions include George Mason University's "Mason Mentors" Program and UC-Berkley's Computer Science Mentor Program.
Similar solutions exist in the market and are offered by companies like PeopleGrove. The idea would be to draw inspiration from this platform and build something in-house for the NC State Computer Science students and alumni.
The backend must be implemented in PHP using a modern framework like Slim or Laravel. Students are free to choose between a REST architecture (API + frontend framework such as React) or using Twig on the backend for a server-side rendered app.
The Ergonomics Center is housed in the Edward P. Fitts Department of Industrial and Systems Engineering at North Carolina State University and provides consulting and training services to clients throughout the U.S. and globally. The Center was founded in 1994 as a partnership between NC State University and the NC Department of Labor. It was created to make workplaces safer, more productive, and more competitive by providing practical, cost-effective ways to reduce or eliminate risk factors associated with musculoskeletal disorders.
How much is too much?
When engineers design a task, how do they know if people will be physically able to complete it without getting injured?
Ergonomics and safety professionals have been asking those questions for decades. Several tools have been developed to help provide answers when it comes to defining acceptable lifting, lowering, pushing, pulling, gripping, pinching, and carrying weights and forces. The Ergonomics Center has created downloadable Excel-based calculators using the information in these tools and made them freely accessible to professionals on its website (ErgoDATA). There are one or two industrial ergonomics apps available to the public free of charge at the current time – the CDC’s NIOSH Lifting Equation App – NLE Calc and Intergo’s MMH Calculator Free. Both apps only address two-handed lifting which leaves out many other types of tasks (e.g., pushing, pulling, carrying, one-handed manual material handling) that are addressed by the Center’s calculators. There is a definite need to add additional apps to the ergonomics practitioner’s toolbox.
Ergonomics professionals and teams often collect analysis data while observing a task on the production floor in real time. The ergonomics calculators are straightforward and can be used quickly during the real time observation when using a laptop. Unfortunately, laptops can be unwieldy in crowded production floor work spaces, if they are allowed at all. Smartphones are much less intrusive and more manageable in these settings, but the Ergonomics Center’s spreadsheets are cumbersome to use on a phone. For a few years now, knowing that the Center is part of a university, many clients have asked if app versions of these ergonomics calculators were available. Unfortunately the Center staff does not currently have the capability to “translate” the existing Excel spreadsheets into mobile phone-friendly tools. Development of smartphone apps for these ergonomics analysis tools is appealing because they would be designed specifically for a mobile phone and would not require internet access at the time of use.
The Center envisions these ergonomics analysis apps mimicking the look and feel of its existing Excel-based spreadsheets. If possible the apps should be usable without an internet connection since cellular and wifi signals on a production floor can be limited or non-existent in some facilities. The apps should also have the capability to have input variables and output results exported to a report-ready printable format such as PDF, Word or Excel. Because analyses will be presented to clients, management, and other high-level decision-makers, the appearance of the exported information should be professional and require little manipulation or adjustment by the user. Because many clients work with classified and/or proprietary information, the app information should not be stored in the Cloud; information could be stored locally or be exported in one of the forms mentioned above.
The Center is flexible on the technology used and is willing to proceed with student recommendations. The development of a standard “native” app or apps is not mandated if another technology (such as a Progressive Web App) is deemed more suitable for use.
Mobile friendly tool(s) will be provided free of charge on The Ergonomics Center’s website.
In Computer Science at NC State during Fall 2023, more than 40 instructors are teaching over 40 different undergraduate courses with more than 1600 undergraduate students enrolled across those courses. In addition to undergraduate courses, hundreds of graduate students are also enrolled in over 50 courses. Dr. King coordinates the CSC316 Data Structures & Algorithms course, which has over 225 students enrolled during Fall 2023. Dr. Schmidt coordinates the CSC226 Discrete Math course, which has over 490 students enrolled during Fall 2023. Many instructors offer in-person office hours, virtual office hours, or a combination of both each week. Our goal is to be as efficient and effective as possible when providing assistance to students outside of the classroom.
One of the challenges instructors face when dealing with large courses and large teaching teams is the need for an organized and efficient approach to managing office hours. Instructors, students, and teaching assistants often face difficulties in coordinating and facilitating office hours effectively. Traditional methods can lead to confusion, inefficiency, and missed opportunities for valuable student-teacher interaction. Although there are existing solutions to this problem, these lack at least one of robustness, ease of use, stability, or features that would make it a trustworthy solution for instructors to rely on. There is an opportunity to develop a comprehensive digital solution that addresses these challenges, resulting in a more productive use of time and resources for both students and the teaching staff.
Our envisioned solution is a web application designed to simplify the way office hours queues are managed in university courses. The application will offer instructors, teaching assistants, and students a platform to coordinate, join, and facilitate office hour interactions.
The application will provide the following features:
We would like this application to be web-based running on Docker containers. It should be implemented as a Progressive Web Application (PWA) with React and MaterialUI on the frontend and Node.js on the backend. We prefer MariaDB/MySQL as the database engine, but PostgreSQL is also acceptable.
Katabasis is a non-profit organization that specializes in developing educational software for children ages 8-15. Our mission is to facilitate learning, inspire curiosity, and catalyze growth in every member of our community by building a digital learning ecosystem that adapts to the individual, fosters collaboration, and cultivates a mindset of growth and reflection.
Computer Science (CS) education is of increasing importance to educators, parents, and school administrators as a greater number of CS jobs become available in the workforce. However, many children, particularly in rural high-need areas, have very little access to quality educational content on this subject matter. Further compounding the issue, many children in these areas have such little exposure to CS that they become too intimidated to engage with CS education even when given the opportunity. Katabasis is seeking to expose students to CS in a nontraditional way to combat these barriers and spark engagement through the medium of art. This will also open the door for future innovative teaching opportunities.
We are seeking a team of students to develop a block-based coding system for creating algorithmic art. The system will help teach students about artistic concepts such as fractals, evolutionary art, and other art generated from fixed patterns, while also introducing basic computer science concepts such as conditionals, looping, variables, etc. The major touchstones for this project are the visual block-based programming languages, Snap! and Scratch. There should be a portion of the UI for assembling block-based code and another portion of the UI for displaying the art generated when the code is run. The primary focus of the project is this block-based programming/art portion of the platform.
In addition to this portion of the project, we are asking students to also develop a web platform to contain the block-based programming component and to facilitate students sharing their art with others.
The core features the system must include are:
This project will have 2 core tech platforms: the block-based coding system, that will be made in Unity, and the web platform, that is intended to be integrated within an existing Docker application (including a web portal with login functionality and access to a different art platform) and should align with the tech stack, described as follows:
The team will be provided with a sample that will demonstrate the interplay between all 4 of these components (frontend, backend, database, and Unity).
LexisNexis® InterAction® is a flexible and uniquely designed CRM platform that drives business development, marketing, and increased client satisfaction for legal and professional services firms. InterAction provides features and functionality that dramatically improve the tracking and mapping of the firm’s key relationships – who knows whom, areas of expertise, up-to-date case work and litigation – and makes this information actionable through marketing automation, opportunity management, client meeting and activity management, matter and engagement tracking, referral management, and relationship-based business development.
The key to successful business development is the strength of your engagements with prospective clients and how it changes over time. LexisNexis InterAction® has an algorithm that, based on activities such as meetings, phone calls and email exchange, calculates the strength of an engagement between two individuals.
The behavior of the algorithm is under review and a tool to investigate the impact of parameter change is needed, visualizing how the engagement score would change over time.
The objective of this project is to produce a tool to allow the review and comparison of algorithm parameter sets by visualizing the resulting engagement scores and how they change over time.
An engagement score is the result of assigning value to the activities between two contacts, and the changing impact of those activities over time
Some examples of the tool's potential capabilities include:
(This list is considered neither exhaustive nor a statement of the agreed scope of the project.)
An agile development process will be utilized, agreeing on a sequence for functional implementation, incrementally delivering capabilities, and adjusting future deliveries on the basis of feedback.
The team may choose their technology stack with any mix of Javascript, Python, and C#.
Angular 14 and D3 should be used for any front end and visualizations.
(As a stretch goal, the team could consider an Event Sourcing pattern for the implementation.)
A log of anonymized activity data will be provided.
Our company is an innovative, global healthcare leader committed to saving and improving lives around the world. We aspire to be the best healthcare company in the world and are dedicated to providing leading innovations and solutions for tomorrow.
Merck’s Security Analytics Team is a small team of Designers, Engineers and Data Scientists who develop innovative products and solutions for the IT Risk Management & Security organization and the broader business as a whole. Our team’s mission is to be at the forefront of cybersecurity analytics and engineering to deliver cutting-edge solutions that advance the detection and prevention of evolving cyber threats and reduce overall risk to the business.
The Security Analytics Team is the curator of Merck’s cyber data lake, an Amazon Web Services based, Merck proprietary data lake aimed at providing a single point of presence for cybersecurity relevant telemetry data from a variety of systems at Merck. The data lake ingests terabytes worth of data daily from over 20 different systems, which is then normalized and presented to consumers through a secure access layer for uniform consumption.
Due to the scale and variety of data being ingested, maintaining visibility into the three V’s generally associated with big data operations – volume, velocity and variety – are paramount to the Security Analytics team. The team is constantly working on new tools to help better understand the baselines of data being ingested to better understand trends as well as early identification of failures or exceptions in the ingestion process. The team needs to be able to monitor that the volumes that are increasing are in line with expectations (to ensure that asset counts are accurate and line up with configuration management efforts), that data points are received in a timely fashion, and that the ingested data does not deviate from what was expected.
We would like the students to develop a solution for tracking volume, velocity and variety of data being ingested and display baselines and trends in a web-based application. As part of the solution, the user should be able to specify what data sets can be ingested along with their configuration parameters (frequency, subsets versus entire data catalogs, etc.). For example, the student team will ingest some publicly available cybersecurity data (such as data from NIST or MITRE) to a data store of their choosing (may be either relational database based or select from open source big data platforms such as Hadoop). The ingestion process will be created to run on a recurring basis with a defined frequency of at least daily, as appropriate based on the sample data selected. The students will then create an application that will monitor the ongoing ingestion, tracking baselines for the volume, velocity and variety of the data. The students will create a dashboard to display the results in graphical form, highlighting daily, weekly and monthly trends. Additional development should be considered to identify potential anomalies (such as data type changes, where data fields may change from strings to integers) in ETL (extract, transfer, load) processes, which would provide early notification to the engineering team in the event of an outage or exception.
The students can select their choice of platforms and programming/scripting languages for data ingestion and frameworks for UI development/dashboarding. The students will be responsible for reviewing the problem statements and proposing a target architecture and tool set to the Security Analytics Team, along with justification for their selections and any caveats/assumptions that drove the decisions.
The CSC Undergraduate Curriculum Committee (UGCC) reviews courses (both new and modified), curriculum, and curricular policy for the Department of Computer Science.
North Carolina State University policies require specific content for course syllabi to help ensure consistent, clear communication of course information to students. However, creating a course syllabus or revising a course syllabus to meet updated university policies can be tedious, and instructors may miss small updates of mandatory text that the university may require in a course syllabus. In addition, the UGCC must review and approve course syllabi as part of the process for course actions and reviewing newly proposed special topics courses. Providing feedback or resources for instructors to guide syllabus updates can be time consuming and repetitive, especially if multiple syllabi require the same feedback and updates to meet university policies.
The UGCC would like a web application to facilitate the creation, revision, and feedback process for course syllabi for computer science courses at NCSU. Users will include UGCC members and course instructors (where UGCC members can also be instructors of courses). The UGCC members should be able to add/update/reorder/remove required sections for a course syllabus, based on the university checklist for undergraduate course syllabi. UGCC members should be able to provide references to university policies for each syllabus section, as well as specific required text (that instructors cannot change) as outlined by university policy. UGCC members should be able to update/revise the specific required template text, as appropriate, so that these updates are pushed to all new syllabi created using the tool. Instructors should be able to use the application to create a new course syllabus, or revise/create a new version of an existing course syllabus each semester. UGCC members can then review an instructor’s syllabus in the application and provide comments/feedback on each section of the syllabus, including flagging specific sections of the syllabus for required revision by the instructor. A history of revisions should be maintained. Instructors and UGCC members should be able to download a properly formatted course syllabus in DOCX, PDF, HTML, and Markdown (since several instructors use GH Pages to host syllabus) formats.
Other languages and technologies may be used, but they must be approved by the sponsors.
Cisco is the worldwide leader in technology that powers the Internet.
Cisco’s Security and Trust Organization’s (STO) InfoSec team is sponsoring the “Managing Firewall Access Control List for Cloud Services” project.
The InfoSec team is responsible for securing Cisco via controls and policies. From time to time the InfoSec team runs into limitations of controls. This project is in response to a current limitation the InfoSec team is faced with.
Managing firewall access control lists for cloud services is becoming harder and harder each day. The issue is that services in the cloud are very dynamic, as in they may scale up or down at any given point. This includes the underlying cloud infrastructure that runs the service. In fact, the dynamic way that these cloud services can be scaled up and down or moved between geographical regions is one of the main drivers to leverage services in the cloud. However, due to the static nature of firewall access control lists become difficult to manage.
Example: Company A runs their widget cloud service in Amazon Web Services (AWS) West datacenter. Their customers are demanding that they provide geo-redundancy, so they deploy their widget service in AWS East datacenter. AWS runs into a network issue and the West traffic is redirected to the East datacenter. The issue is that Company A never told their customers that they added the East datacenter and the customer’s firewall is blocking access to the IP addresses in the East datacenter. Another example is when the infrastructure providers, such as AWS change the IP address assignment to Company A public widget service. When the IP address of the widget service changes there is usually no notification to the widget service owner to notify its consumers of the widget service of the change resulting in the widget service not being accessible due to firewall rules.
This project could go one of two ways. The first step would be researching the industry to see if there is an existing framework that could be used to solve the problem statement. If so, then the project would consist of integrating that framework in a secure way to automate dynamic firewall Access Control Lists (ACLs).
If there were no existing solutions, then the project would involve validating the problem statement and taking those findings to design and build a prototype for an industry-wide solution.
Possible starting points:
Project would need to use open standards to solve the problem at an industry-wide level. Prototype could focus on few vendors due to time constraints.
Programing language: any modern language
Networking skills: Understanding Network ACLs
Operating Systems: Windows, Linux – need understanding of how DNS works.
Since time is a constraint, the deliverable would be:
Required: design document
If time: a working example of a “DNS type” service for looking up cloud IP Addresses
If time: a working example of modifying the ACLs of a router based on the data in the “DNS type” service.
The Christmas Tree Genetic Program (CTG) at NC State’s Whitehill Lab is working on genomic tools to develop elite Fraser fir trees. Graduate students are working on elucidating mechanisms involved in the tree abilities to handle disease pressure, pest problems and challenges brought about by climate change. Understanding these mechanisms allow the researchers to develop Christmas trees that are more resilient to biotic and abiotic stressors.
Scientists in the CTG program handle a large number of plant material such as unique individual trees, cones, seeds, embryos, cultures and clones. Currently, all the data is managed using Microsoft Excel, which will quickly become obsolete in the face of a growing amount of plant material information needing to be stored. Plant material tracking is key for data integrity. We need to know what is what, when the material was last transferred, and its current location. A database will help manage our inventory and prevent data loss and mismanagement. Such a database is referred to as a Laboratory Inventory Management System, or LIMS.
This is the second round of development for the ROOTS database, which started as a CSC Senior Design in Spring 2023.
ROOTS is a repository of data related to CTG’s research activities both in the fields and in the laboratory.
The various steps of the protocols used by the research group are represented in the database. Individual plant materials of various stages are saved in the database (trees, cones, seeds, embryos…) along with metadata (origin, transfer date, quantity, location…)
The first round of development, by the Senior Design team in Spring 2023, resulted in a strong emphasis on lineage tracking and nomenclature. The ROOTS DB ensures that the seeds from a tree are connected to the parents and the progeny (“children”). The naming nomenclature contains specific information related to the tree breeding work done by the CTG. The system has three types of users: user, superuser and admin. The user has viewing privileges only. The superuser can add, modify, and discard data in the system, and generate reports of material data based on species, genealogy, and other criteria. The admin has additional permission to add new users, superusers, and admins to the system.
The second round of development for ROOTS 2.0 will focus on addressing feedback from the users after testing of ROOTS 1.0. The Christmas Tree Genetics program has two main outstanding requirements:
It will also focus on other features needed in ROOTS such as:
ROOTS is a web application using the following stack
Frontend: React with the Material UI and NPM QR Reader packages.
Backend: NodeJS with an Express.JS framework and Sequelize for the Object-Relational Mapper
Database: MySQL
Authentication: Shibboleth
Containerized using Docker
Dr. Tiffany Barnes and Dr. Veronica Cateté lead computer science education research in the department of computer science at NC State University. Dr. Barnes uses data-driven insights to develop tools that assist learners’ skill and knowledge acquisition. Dr. Cateté works closely with K-12 teachers and students conducting field studies of technology use and computing in the classroom.
Together they have advocated for the use of the block-based SnapI programming environment and have worked closely to develop solutions for live classrooms, engaging over 800 students and teachers each school year in computing-infused lessons.
To help aid the nation’s critical need for a more computer science literate populace, researchers and educators have been developing interventions for increased exposure and equitable learning experiences in computer science for K-12 students. In addition to the creation of standalone courses like AP CS Principles, Exploring Computer Science and CS Discoveries, we have also been working on developing integrated computer science experiences for common K-12 classes such as English, Math, Science, and Social Studies.
To support the new influx of teachers and educators from various backgrounds to teach block-based programming lessons, we developed a wrapper for the Snap! language called Snapclass. This system supports assignment creation and student submission as well as project grading. The tool was developed using various programming paradigms and after initial deployment and usability test, we have new feedback to work with to address user needs and tool functionality.
SnapClass v4.0 will build off of the work done by three prior Senior Design teams (Spring 2022, Fall 2022 and Spring 2023). The prior teams have added useful functionality to SnapClass such as the integration of multiple block-based programming languages into the environment, a FAQ for students working in Snap, mechanisms for auto-saving code, differentiated assignment to students based on skill level, non-coding assignments, etc.
Snapclass like other software projects often have a long lifespan, with new features and updates being added over time. Regular bug fixes, updates, and optimizations are necessary to keep the software running smoothly. In order for the Snapclass system to reach a new level of users, the codebase needs to scale accordingly. This means new features, modules and components should be easy to add without compromising the stability or performance of the system. With a well-structured and maintainable code base, we can more easily adapt to changing user requirements and integrate more third-party libraries or frameworks such as LMS support.
Prior developers and researchers working on Snapclass have put together a list of defects and functionality that fall short of what K12 educators desire. This semester, for SnapClass v4.0, we would like to work with the team of students on the final 15% of the project; polishing usability, functionality, and improving overall system effectiveness. As the team catalogs the inventory of improvements, we encourage them to research software architecture and database best practices so that they may have the opportunity to refactor different modules of Snapclass.
Some of the changes that need to be made follow and a further description of each can be found here:
Adam Gaweda is an Assistant Teaching Professor for the Computer Science department at NC State University. As a member of the NCSU CSC Faculty he, like many faculty, needs to provide quality instruction while addressing academic misconduct enabled by assignment materials and solutions that are easily searchable online.
With the rising prominence of platforms used for cheating like Chegg and ChatGPT, faculty are required to periodically and proactively monitor these external platforms for potential academic integrity and copyright violations. Furthermore, instructors must regularly conduct searches on current and prior assignments, as students may post assignment materials while seeking assistance during its initial release, as well as part of their portfolio after the course is over. However, the time and effort doing so continues to increase as student enrollments increase, and the process is becoming increasingly difficult to monitor as new course assignments are created and released.
The purpose of this project is to assist instructors with monitoring known websites, such as Chegg, CourseHero, GitHub, ChatGPT, etc. for unauthorized postings of course materials or solutions for course assignments. The project would allow the installation and configuration of plugins that enable monitoring frequently used platforms and tools for course materials. Likewise, since the likelihood of a potential violation may decrease after an assignment is released, the tool should also gradually reduce the frequency in which the material is scanned for. For example, if CSC 116 releases the assignment requirements for Homework 1, the tool should scan for potential violations daily, but shift to a weekly or monthly scan after its due date. In addition, scans should be scheduled to avoid server or API rate limitations set by the platforms. For this project, the team will create the core software, as well as 1-2 plugins that will interact with external services.
One potential solution involves a web-based platform that would enable several possible features, including:
Students involved in this project would need to use HTML/CSS/JavaScript, as well as backend web development frameworks to build the web application. Students will also need to develop HTML parsing scripts that can extract information for these platforms and server configurations to schedule regular evaluation of the course materials. In addition, students would need to interact with any available APIs to LLMs while working on this project. Finally, a form of communication to inform instructors about potential matches should be developed.
API and Web-scraping (such as BeautifulSoup, scrapy, lxml, and requests)
Web-based (Flask or Django preferred, but other web frameworks will be considered)
Database (Postgres or MySQL, SQLAlchemy)
Linux-based Service Scheduling and Mailing tools (cron, Celery, but flexible to specific tools, though should be installable to servers running CentOS or RHEL)
NLP (text similarity, awareness of basic concepts preferred)
LexisNexis® InterAction® is a flexible and uniquely designed CRM platform that drives business development, marketing, and increased client satisfaction for legal and professional services firms. InterAction provides features and functionality that dramatically improve the tracking and mapping of the firm’s key relationships – who knows whom, areas of expertise, up-to-date case work and litigation – and makes this information actionable through marketing automation, opportunity management, client meeting and activity management, matter and engagement tracking, referral management, and relationship-based business development.
Effective tools fit with the way you work. With that in mind, LexisNexis InterAction® has a series of Microsoft Office Add-ins and integrations that allow users to access their customer data from Outlook, Excel & Word.
Rather than using Microsoft Office, however, many smaller legal firms are turning to Google Workspaces to manage their emails, contacts, and calendars. Currently, InterAction doesn’t have any support for Google applications.
The objective of this project is to produce LexisNexis InterAction tools that integrate with Google Workspaces. We would like to 1) create a process that allows users to synchronize contact data between InterAction and Google Workspaces, and then 2) provide a background service to automate this process.
The first of these should be an Add-On to the Google Contacts application, similar to the InterAction Microsoft Office Add-in, that would allow the user to:
The second would be a Background Service using the Google Workspaces API to synchronize changes to contact data with InterAction.
As a stretch goal, the team should investigate how a similar approach could also be applied to Gmail or Calendar and InterAction
The team may choose their technology stack with any mix of Javascript, Python, and C#.
Angular 14 and D3 should be used for any front end and visualizations.
An overview of the InterAction MS Office Add-ins will be given, together with a resource pack for styling.
Credentials and access to a test instance of InterAction will also be provided.
McAfee is a worldwide leader in online protection. We’re focused on protecting people, not devices. Our solutions adapt to our customers’ needs and empower them to confidently experience life online through integrated, easy-to-use solutions.
Scam or phishing websites are increasingly cheap and easy to stand up, run for a short period of time and then are torn down, potentially before traditional cybersecurity tools can be run against them. Real time analysis, by pulling apart the different sections of a suspicious website’s individual components on the client machine and delegating to cloud AI, will provide improved protection for consumers.
We would like students to build a browser extension that will dissect a web page for relevant items of interest for further analysis such as video, audio, images or blocks of text.
Proposed Solution:
Build a Cloud API that will combine custom video/audio/image/text classification (such as “AI Generated”, “Phishing”, “Scam”, etc) using existing Open Source models or new Models created from open source datasets (for example, data from this paper). This will be used in conjunction with McAfee Threat Intelligence, which will be provided as a REST API.
Build a web browser extension that extracts relevant items of interest and makes an API call to a cloud service to get an evaluation, which can include a proprietary trust score, site categorization and additional properties, of the item and overlay the results on top of the item in question in the browser in real time.
A complete solution will include a browser client to navigate to an arbitrary web page, identify the objects of interest, query the cloud service for the classification details and display the result to the user inline on the web page.
Examples:
The solution should be developed for the Google Chrome browser (all platforms), Microsoft Edge, Mozilla Firefox and the Apple Safari browser (macOS and iOS).
Sterling McLeod is an Assistant Teaching Professor for the Computer Science Department at NC State University. He teaches the CSC 116 course, where multiple different instructors each semester teach hundreds of students in different sections. Dr. McLeod would like to create a way for all CSC 116 instructors to have isomorphic tests so that the assessment of students is consistent across all sections.
The goal of this project is to develop a password-protected website that will act as a repository for isomorphic test questions. This will allow easier test creation that is consistent among many sections of a course, while still allowing some flexibility for each instructor to choose which questions they want to ask.
Consistency in assessment among many instructors of a course is critical in ensuring that each student is assessed fairly. It’s also important to ensure all course Student Learning Outcomes (SLOs) are being assessed properly. An example set of SLOs is below.
Upon successful completion of this course, a student will be able to...
Problems in assessment often arise when instructors have varying degrees of difficulty in their test questions, when certain SLOs are left out of assessments, and/or when questions are graded with different degrees of rigor. This can lead to inconsistencies in students’ preparedness for subsequent courses, bottlenecks in student degree progression (which cause space issues for the university), and inconsistencies in students’ Grade Point Averages (GPAs).
Test consistency is difficult to obtain for several reasons. When there are many instructors for a course (4+), agreeing on test rigor can be an endless discussion. Instructors can be partial to questions they create and be unwilling to not include them. Sometimes instructors simply may not want to go through the effort of creating new test questions, solutions, and rubrics due to the enormous amount of time that task can take.
This project aims to address these issues by providing a mechanism for instructors to create their tests to be consistent with other sections. The proposed platform for the tool will be a website allowing faculty to do the following:
All questions stored on the site will be approved by a team of faculty relevant to the course. This will enable faculty to continue using questions they are partial to but ensure that each question meets a baseline level of rigor.
The benefits of this work will be:
I think a website will be the best platform for this project. Each question should have various data associated with it, such as:
Storing this data can be done with a simple JSON and/or in a more formal database.
Making a website as opposed to some other platform will be nice because it can be easily accessed by many faculty. The website will need some kind of password protection. If a static page site (like Jekyll, Hugo, etc.) can be used then that would be great, but I’m not familiar enough with web development to envision a solution using those.
There are none.
IBM is a leading cloud platform and cognitive solutions company. Restlessly reinventing since 1911, we are the largest technology and consulting employer in the world, with more than 350,000 employees serving clients in 170 countries. With Watsonx, the AI platform for business, powered by data, we are building industry-based solutions to real-world problems. For more than seven decades, IBM Research has defined the future of information technology with more than 3,000 researchers in 12 labs located across six continents.
The Department of Forest Biomaterials at NC State’s College of Natural Resources is home to one of the oldest and most respected paper science and engineering programs in the world focused on cutting edge sustainable materials innovation and smart manufacturing.
As of 2018, 300 MM tons of municipal solid waste (MSW) was available in the US. Of that
material, about 50% ends up in landfills, which is a growing concern for all communities due to its significant impact on the global environment. Therefore, the rational use or valorization of MSW is essential in a future economy based on sustainability.
We are working on AI-driven MSW characterization with the use of visual, multi-spectral, and hyperspectral sensors. The idea is to build and train models to identify the type of materials (paper, plastics, food, textiles, etc.) in real-time. Specifically, we plan to build an AR assisted sorting for the workforce working in a materials recycling facility (MRF). One step to achieve this is to augment reality by tracking multiple objects moving on a belt and putting labels (color and text) on each object.
Our objective is to augment reality by tracking multiple objects on a moving belt and putting labels
(color and text) on each object. One example scenario is described below:
Every 10 seconds, a set of objects are placed on a belt. The AR engine puts labels (color codes and text) on each object and tracks its position until the object comes off the conveyor belt. The labels and the initial position of the objects would be already given to the team to scope out the project.
Last semester, one of the CSC team worked on the initial proof-of-concept with good success; however, the accuracy of the labeling/tracking needs to be further improved and implemented on a real conveyor system. Eventually, this work will be integrated into an AI system to identify and label the objects.
Skills: Computer Vision, AR toolkits, 3D Visualization
GitHub (preferred)
Computer Vision, AR
The Laboratory for Analytic Sciences (LAS) is a research organization in support of the U.S. Government, working to develop new analytic tradecraft, techniques, and technology that help intelligence analysts better perform complex tasks. Processing large volumes of data is a foundational capability in support of many analysis tools and workflows. Any improvements to existing processes and procedures, whether they are measured in time, efficiency, or stability, can have significant and broad reaching impact on the intelligence community’s ability to supply decision-makers and operational stakeholders with accurate and timely information.
Modern media sources produce immense quantities of speech audio recordings every day across the globe. Information producers and consumers both benefit from cross-lingual transcriptions. However, language analysts are of course overwhelmed in this environment, and in most cases employing their services is cost-prohibitive. Thankfully, machine-learning methods have generated moderately capable
speech-to-text (STT) and machine translation (MT) algorithms which are far faster, and more economical, to deploy. Of course, while for some applications these solutions are sufficient, regional dialects and accents are complicating factors and it is simply the case that the accuracy of the models is often lacking, even for common languages. These shortcomings limit, and even prohibit, the utility of STT and MT for many applications. We desire to improve the efficacy of STT and MT capabilities, with a present emphasis on the former.
To create STT algorithms, a machine learning model is provided with ground truth samples of both speech recordings and associated, human-transcribed, text. Through complicated processes beyond the scope of the present document (and this project), the model learns to correlate elements of speech utterances with phonemes and words. Loosely speaking, larger machine learning models (in terms of the number of trainable parameters they contain) outperform smaller models, however the larger a model is the more training data is typically required to train it. Thus an ever-present issue in the machine learning world is the difficulty and cost associated with creating or acquiring a large corpus of ground truth data to use for training models. What follows is an approach to acquire additional ground truth training data for STT algorithms, which will, in turn, enable data scientists to train more, and larger, STT models. It will also afford opportunities to “fine-tune” models for specific dialects and accents of particular interest.
One common workflow for language analysts is to transcribe a foreign language audio recording directly into the desired language of interest. For example, an audio recording of a Spanish speaker may be transcribed/translated directly into English by an analyst. The same recording may have STT and MT applied to achieve a similar, albeit typically far less accurate, result. However, since the language analyst has already decided to transcribe/translate the audio, there is opportunity to record ground truth training data that could later be used to improve the accuracy of STT and MT algorithms at a smaller cost than would otherwise be required. “All” that is needed is for the analyst to take the extra time to “correct” the STT output. In practice, this “correcting” process means that the analyst is presented the inaccurate STT output in a text editor, and is asked to edit the presented text into the accurate transcription. Analysts are already encouraged to perform this task. However, it is already time-consuming for analysts to perform transcription/translation, and in cases where the STT output is quite poor and requires many edits, the act of correcting may take a significant amount of time to perform. This time requirement is often simply too tall an order to add onto the analyst’s workflow, where moving on to transcribe/translate the next audio recording may reasonably take priority.
If we can develop an automated procedure to use the, e.g., SpanishAudio-to-EnglishText transcription/translation that the analyst performed to improve the SpanishAudio-to-SpanishText transcription that the STT algorithm generates, this would reduce the editing burden on the analyst that would be required to correct the output. The following project description describes one such automated procedure which the LAS has performed minimal testing on, and which appears promising enough that we wish to generate a prototype implementation enabling testing on a larger, and broader, scale.
The student team is to create a pipeline to enable testing of various ways in which an analyst’s gold-standards translation might be leveraged to facilitate correction and truth marking of STT, thus ultimately improving the output of a STT algorithm. As a developmental use case, the team is to implement the method described below as a first attempt at leveraging the gold-standards translation.
First, off-the-shelf MT algorithms will be selected and used to convert the foreign language STT to the native language. Next, we attempt to automatically improve the STT output using a large language model (LLM, e.g. chatgpt). We ask an LLM to compare the machine translated STT and the analyst-derived translation, and extract what differences may be present. We then ask the LLM to use these extracted differences to correct the original foreign language STT. Finally, we ask the LLM which sections, or chunks of text, that it is confident are most improved from the original STT output. These LLM-related tasks are to be scripted by the student team, using an LAS-provided API key for a LLM. Finally, the student team is to develop a very basic interface by which those sections, or chunks of text, from the original STT output that the LLM is most confident are improved can be presented to the analyst for correction. The “correct” answer is known only to the analyst, but for testing purposes the LAS will provide a set of “correct” answers (ground truth data, akin to an analyst-derived translation into the foreign language) and ask the students to implement a calculation of the Levenshtein distance between the LLM-improved STT and the ground truth, as well as the distance between the original STT and the ground truth. This will enable evaluation of the above method, and others that the LAS may later incorporate into the pipeline the student team creates.
Note that if time permits, a simple variation of the above method could also be supported without much difficulty. In this variation, rather than using the MT to convert the foreign language STT to the native language, the MT would be used to convert the native language translation to the foreign language. The remaining steps would be very similar, i.e. we would then again use a LLM to compare, identify differences, and use those differences to correct the original STT, the only difference being that the comparison would happen in the foreign language rather than the native language. A flow diagram is presented below depicting these two approaches.
Below is an example, where the foreign language of the audio was Spanish, and the native language was English (examples from tatoeba.org). In this case, the STT underwent machine translation into the native language, and the STT “corrected” by the LLM is accurate to the ground truth data.
STT |
MT of SST |
Analyst translation |
ChatGPT corrected STT |
Ground truth |
Levenshtein Distance |
la trufa es un hongo que vive en simbiosis con las raíces de algunas plantas tales como Robles avellanos as a House |
The truffle is a fungus that lives in symbiosis with the roots of some plants such as Oaks, hazelnuts, as a House. |
A truffle is a fungus which lives in symbiosis with the roots of certain plants, such as oak, hazel, beech, poplar, and willow trees |
La trufa es un hongo que vive en simbiosis con las raíces de algunas plantas, tales como robles, avellanos, hayas, álamos y sauces. |
La trufa es un hongo que vive en simbiosis con las raíces de algunas plantas, tales como robles, avellanos, hayas, álamos y sauces. |
STT to Ground truth distance is 21 ChatGPT corrected STT to Ground truth is 0 |
Below is an example, where the foreign language of the audio was Spanish, and the native language was English (examples from tatoeba.org). In this case, the analyst translation underwent machine translation into the foreign language, and the STT “corrected” by the LLM does not appear to be improved over the original (at least in terms of Levenshtein distance). This example illustrates why we desire the student team to develop this pipeline to support multiple methods because development of a consistently successful method is under current investigation at the LAS.
STT |
Analyst translation |
MT of Analyst translation |
ChatGPT corrected STT |
Ground truth |
Levenshtein Distance |
Un reportero se aprovecha de lo que obtiene de cualquier fuente y uso de las del tipo a dicho, un pajarito. |
A good newspaper reporter takes advantage of what he learns from any source, even the "little bird told him so" type of source. |
Un buen reportero de un periódico aprovecha lo que aprende de cualquier fuente, incluso del tipo de fuente "el pajarito se lo dijo". |
Un buen reportero se aprovecha de lo que obtiene de cualquier fuente y hace uso de las fuentes de tipo "dicho por un pajarito". |
Un buen reportero se aprovecha de lo que obtiene de cualquier fuente, incluso de las del tipo "me lo ha dicho un pajarito". |
STT to Ground truth distance is 21 ChatGPT corrected STT to Ground truth is 25 |
The student team is asked to design, engineer, and develop a prototype incorporating as many of the below key features as possible:
The LAS will provide the student team with a dataset for testing (most likely from Tatoeba, an open source collection of crowdsourced sentences and translations, or from the Linguistic Data Consortium’s “Call Home” data sets), an extensible software development environment utilizing AWS resources, access to an LLM (e.g. an API key for chatgpt or similar), and expert consulting/mentoring. Open-source STT and MT algorithms are available for use.
The prototype should be stand-alone and should not have any restrictions (e.g. no enterprise licenses required), with the possible exception of the LLM utilized. In general, we will need this application to operate on commodity hardware and be accessible via a standard modern browser (e.g. Chrome, Microsoft Edge, etc). Beyond those constraints, technology choices will generally be considered design decisions left to the student team. That said, the LAS sponsors for this team have experience with the following technologies and will be better able to assist if they are utilized:
ALSO NOTE: Public distributions of research performed in conjunction with USG persons or groups are subject to pre-publication review by the USG. In the case of the LAS, typically this review process is performed with great expediency, is transparent to research partners, and is of little to no consequence to the students.
The Laboratory for Analytic Sciences (LAS) is a research organization in support of the U.S. Government, working to develop new analytic tradecraft, techniques, and technology that help intelligence analysts better perform complex tasks. Processing large volumes of data is a foundational capability in support of many analysis tools and workflows. Any improvements to existing processes and procedures, whether they are measured in time, efficiency, or stability, can have significant and broad reaching impact on the intelligence community’s ability to supply decision-makers and operational stakeholders with accurate and timely information.
Our main goal of this semester is to demonstrate the need for a machine learning based recommender system when it comes to data prioritization in an enterprise setting.
In a typical large-scale enterprise, data acquisition, processing, storage, and searching may be split across many different systems. With the introduction of machine learning and business intelligence, the process also adds combining all of this data from multiple sources into a large, central repository (data warehouse) for searching and indexing. The process whereby this occurs is called the Extract, Transform, and Load (ETL) process. As each system evolves and grows independently, some data information may not be indexed for searching.
As data volumes continue to increase, there may be a time when the amount of data (and corresponding derivatives like machine learning feature embeddings) may exceed the total storage size of a data warehouse. We would like to continue creating a web application to demonstrate two different methods of managing this data in a user-prioritized manner. That is:
We seek the help of NCSU Senior Design to further design a web application that can enable demonstration of both of these methods.
Last semester, the NCSU Senior Design team created a basic application to specific rules in the cybersecurity domain. A user was granted either edit or read-only permissions. Once within the application, they were able to add specific rules tailored to the use-case (e.g. IP addresses with or without ports) into a group (named buckets). Within each bucket, a user could reorder the rules. In the overall system, this would allow a specific rule to store more data than another.
For the current semester, we seek the help of a team to add functionality to enhance manual rule-based prioritization and demonstrate visualizing results of a machine-learning based prioritization method. Generally, the team will be asked to do the following:
Specifically, most of the UI work will be in augmenting the bucket page with additional features to showcase the differences between the defined rules and the proceeding days data.
In addition to the new core capabilities, we may also want to explore a freely open dataset that would pivot our demonstration closer to a news recommendation system (vs. cybersecurity). With a new domain, some of the code may need to be generalized for better understanding.
We anticipate sharing last semester's code and guides with the team. This application has the following technology stack:
For testing, we would like the team to look at Jest or Cypress for end-to-end frontend testing.
ALSO NOTE: Public distributions of research performed in conjunction with USG persons or groups are subject to pre-publication review by the USG. In the case of the LAS, typically this review process is performed with great expediency, is transparent to research partners, and is of little to no consequence to the students.
The Laboratory for Analytic Sciences (LAS) is a research organization in support of the U.S. Government, working to develop new analytic tradecraft, techniques, and technology that help intelligence analysts better perform complex tasks. Processing large volumes of data is a foundational capability in support of many analysis tools and workflows. Any improvements to existing processes and procedures, whether they are measured in time, efficiency, or stability, can have significant and broad reaching impact on the intelligence community’s ability to supply decision-makers and operational stakeholders with accurate and timely information.
LAS would like to implement, demonstrate and evaluate the applicability of using the orchestration technology Argo Workflows to automate MLOps (Machine Learning operations) tasks in a Kubernetes cluster. The core intent of this project is to enable integration of open source projects in a proper, systemic fashion. Argo Workflows is an open source container-native workflow engine for orchestrating parallel jobs on Kubernetes. There are many background tasks in complex systems that need automation and orchestration tools facilitate that need. Students will gain experience executing this integration in a way that benefits both the data scientists who will use the system, and themselves as engineers.
As the nature of MLOps matures, so does the need to automate the process of deploying Machine Learning (ML) models and allowing them to scale and accommodate business demands. Data scientists are training more operational ML models than ever before, and the challenge of effectively utilizing these immensely powerful algorithms is quickly becoming one of management and orchestration engineering, rather than model development. Orchestration tools are often part of the glue that serves to automate a multitude of tasks in complex systems, both integrating different technologies and reducing maintenance burdens on engineers. Many entry level software
engineers don’t have experience with system integration and design. It’s beneficial to understand not only how to build an application but also how to make that application work as part of a larger system.
LAS is prototyping a ML Model Deployment Service (MDS) to facilitate the deployment and scaling of ML models in Kubernetes. A critical component of this system will be an orchestration tool to abstract and automate regular manual tasks, which can be very complex. Alleviating this burden from the data scientists is the ultimate payoff.
LAS is seeking a team to demonstrate and evaluate the applicability of using the orchestration technology Argo Workflows to automate MLOps (Machine Learning operations) tasks in a Kubernetes cluster. After establishing initial workflows, create a simple web interface and API to customize workflows and trigger execution. Minimal project deliverables may be demonstrated and tested using a small mini cluster like Kube or MiniKube. LAS will optionally provide ML model containers to test deployment.
Project Deliverables (Minimal):
If the project team has successfully demonstrated progress in meeting the project's minimal deliverables, they may pursue some of the project’s stretch goals.
Project Stretch Goals:
The core technology used in the project must be Argo Workflows, but additional accompanying open source technology choices will generally be considered. Students may develop and test solutions locally on a Kubernetes mini cluster like Kube or MiniKube with the goal of deploying into an AWS EKS Cluster.
Frontend: UI design decisions for the web interface will be left to the student team.
Backend APIs: fastAPI
Orchestration Framework: Argo Workflows
As noted above, the student team will have flexibility in the selection of specific accompanying open source technologies for some portions of the project. That said, the LAS sponsors for this team have experience with the following technologies and will be better able to assist if they are utilized:
ALSO NOTE: Public distributions of research performed in conjunction with USG persons or groups are subject to pre-publication review by the USG. In the case of the LAS, typically this review process is performed with great expediency, is transparent to research partners, and is of little to no consequence to the students.
REFERENCES: Argo Workflows: https://argoproj.github.io/argo-workflows/
NetApp is a cloud-led, data-centric software company dedicated to helping businesses run smoother, smarter and faster. To help our customers and partners achieve their business objectives, we ensure they get the most out of their cloud experiences – whether private, public, or hybrid. We provide the ability to discover, integrate, automate, optimize, protect and secure data and applications. At a technology level, NetApp provides cloud, hybrid and all-flash physical solutions; unique apps and technology platforms for cloud-native apps; and an application-driver infrastructure which allows customers to optimize workloads in the cloud for performance and cost.
NetApp products use cutting-edge hardware to provide its customers with the latest features and performance. Using the most up to date hardware also ensures the availability of all the components for the lifespan of the NetApp product.
NetApp’s storage focused operating system is built on a customized FreeBSD kernel. Since most hardware vendors develop drivers for the Linux kernel initially and then port the drivers to the FreeBSD kernel in a follow-on effort, NetApp often faces challenges getting FreeBSD drivers for the latest and greatest hardware until later in their product development cycle. This creates risks for products due to the limited time available for feature integration and testing.
FreeBSD has a LinuxKPI (https://wiki.freebsd.org/LinuxKPI), which is a small compatibility layer to allow Linux based drivers to port more easily to FreeBSD. While the LinuxKPI exists, the porting effort to implement Linux drivers in FreeBSD is still significant and time consuming. Improving the LinuxKPI to support Linux drivers in FreeBSD with minimal porting effort would allow FreeBSD based products to release sooner on the latest hardware and reduce the development cycles for vendors to port drivers to FreeBSD.
Nvidia ConnectX SmartNICs (https://www.nvidia.com/en-us/networking/ethernet-adapters/) are a great example of a network driver developed initially for Linux and then ported to FreeBSD. The driver for this family of Ethernet adapters leverages the LinuxKPI, but still requires a significant effort to port all the offloads and features to the FreeBSD kernel.
Project Goals
Bonus (Stretch) Goals
Student hardware:
SAS provides technology that is used around the world to transform data into intelligence. A key component of SAS technology is providing access to good, clean, curated data. The SAS Data Management business unit is responsible for helping users create standard, repeatable methods for integrating, improving, and enriching data. This project is being sponsored by the SAS Data Management business unit to help users better leverage their data assets.
Finding relevant data is important for many use cases. When a user sits down to do a report or a data model, they need to find useful and relevant data for their task. Or they may choose some dataset, and want to find another couple that are similar to select from. The goal of this project is to use an open-source recommendation engine technology (https://github.com/microsoft/recommenders) to recommend similar datasets. This should take the form of an application (can be done in React) where the user has a collection of metadata about some tables in a system, each with a defined topic. They should be able to request a table based on the topic and get a recommendation of tables that are similar. Or if they start with a table, they can get a list of other similar tables. Finally, if the user selects data, they get recommendations for other similar data.
The goal of this project is to create an application that users can use to get recommendations on datasets. The application can be done in React, or can be a Web page-based application. The project should explore using this open-source recommendation engine technology (https://github.com/microsoft/recommenders), using a collection of metadata about tables available to the application. The metadata is organized by table topic. SAS can provide an initial set of this metadata.
To start, the user could be presented with a set of popular tables, tables organized by topics (part of the metadata), or other ways to navigate the available datasets. The primary way to initially find tables could be to select a topic. When the user selects a dataset, the user can get recommendations for other, similar data, using the recommendation engine.
SAS can provide an initial set of metadata about the tables. Students can collect more metadata using the python profiler (https://greatexpectations.io/packages/great_expectations_semantic_types_expectations). For the recommendation engine, we would like students to leverage this technology, which includes many examples in the examples folder: open-source recommendation engine technology (https://github.com/microsoft/recommenders). If the students choose to write a web application, there are several starter ones from previous semesters that they could use to get started, including the Data Gist project. There are no legal or IP issues here.
Dr. Stolee is a faculty member with a strong research program in software engineering. This project contributes to their research on program comprehension, and specifically, on a new area of comparative code comprehension.
Comparative code comprehension is a cognitive process that describes how a person understands the differences between two (generally) similar pieces of code. In preliminary studies on comparative comprehension, professional programmers and students alike have trouble spotting the behavioral differences between similar algorithms. This suggests that support is needed to communicate similarities and differences between arbitrary algorithms.
In this project, we will create tools and techniques that help developers understand the differences between similar pieces of code. These techniques will use state-of-the-art software engineering tools for static analysis and/or dynamic analysis, such as AST generation, fuzz testing, and more.
Dr. Stolee envisions a web application that allows users to interact with the system.
There are frontend and backend components to this project. The backend needs to figure out how the two code algorithms are similar and how they are different. For example, using fuzz testing, it might reveal that they behaved the same on inputs X, Y, and Z, but behaved differently on input W. On the frontend, this information needs to be communicated back to the user in an intuitive and understandable way. As a stretch goal, the interface could be dynamic, where the user could make tweaks to the algorithms and then re-assess their similarity.
For example, let’s say the system is provided with two pieces of code:
public static int[] bubbleSort(int[] a) {
boolean swapped;
int n = a.length - 2;
do {
swapped = false;
for (int i = 0; i <= n; i++) {
if (a[i] > a[i + 1]) {
int tmp = a[i];
a[i] = a[i + 1];
a[i + 1] = tmp;
swapped = true;
}
}
n = n - 1;
} while (swapped);
return a;
}
And
public static int[] bubbleSort(int[] array)
{
boolean swapped = true;
for(int i = array.length - 1; i > 0 && swapped; i--)
{
swapped = false;
for(int j = 0; j < i; j++)
{
if(array[j+1] > array[j]) {
int tempInt = array[j];
array[j] = array[j + 1];
array[j + 1] = tempInt;
swapped = true;
}
}
}
return array;
}
One would expect the system, for example, to automatically detect that these behave the same on input arrays with size 0 or size 1, or input arrays with all the same values such as [1,1,1,1]. And then they behave differently on inputs such as [1,2,3,4] (i.e., one sorts arrays in ascending order, and the other in descending). Then, this information would need to be communicated to the user.
Some differences will be detected using static analysis, while others using dynamic analysis. Each analysis should be able to be toggled (turned on or off), so the user can see just the differencing information they desire.
At the end, the infrastructure developed by the team will be used in empirical studies to see how assistance helps people understand behavioral differences between code and may lead to publication in Software Engineering or Computer Science Education conferences or journals. Due to the creative nature of this project, students involved with the research will earn authorship on any subsequent publications.
The students will be participating in an original research project. Dr. Stolee’s prior work in this area has compared algorithms written in Python and Java as the software engineering tool support for these languages tends to be more sophisticated.
The students may borrow or build off existing research projects that measure code similarity using fuzzing.
Required reading:
George Mathew, Chris Parnin, Kathryn T. Stolee: SLACC: simion-based language agnostic code clones. ICSE 2020: 210-221
George Mathew, Kathryn T. Stolee: Cross-language code search using static and dynamic analyses. ESEC/SIGSOFT FSE 2021: 205-217
Middleton, Justin, and Kathryn T. Stolee. "Understanding Similar Code through Comparative Comprehension." 2022 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, 2022.
Gilbarco Veeder-Root is backed up by a powerhouse of brands that run the spectrum of fueling solutions from fleets to cloud technology. This project will allow students to learn more about our family of brands and how they help make Gilbarco Veeder-Root the leader of the retail and commercial fuel industry.
Our Companies include Gilbarco Fuel Dispensers, Veeder Root - Automated tank gauge solutions, GasBoy - Commercial fleet and fuel management, Catlow – Hanging Hardware solutions, Insite 360 – On demand fuel management solutions, ANGI – Compressed Natural Gas (CNG) fueling solutions, and many others. The focus of the project will be geared towards our Gilbarco Fueling Dispensers.
With global manufacturing of several different products, the organization needs to stay on top of inventory levels of every single component that is incorporated into each product. This will help ensure that we have a clear understanding of what products are potentially at risk in terms of manufacturing customer orders. On a regular interval, manufacturing facilities will perform a quality check and validate inventory levels of the parts currently in stock, including a validation of products that are currently in process through the manufacturing process.
In order to maintain as accurate an inventory as possible, the team would like to be able to properly capture work in progress (WIP) activity to better maintain the appropriate counts.
The primary focus of this project will be to develop an Augmented Reality (A/R) solution that allows a user to evaluate parts being consumed by a product in real time on the assembly line. This will make it possible to capture the associated parts that are already consumed in the Work In Progress (WIP) product. As part of the project, the team will have access to a physical unit that will be used for modeling the A/R solution and validate the subcomponents of the final product that is made available. The team will assist in the process of creating product centric CAD models with the help of sponsor resources. Once developed, the team will take the models created and incorporate them for usage in the A/R platform. The students will build as part of their application, a catalog that can be used to pull the appropriate model from a serial number on the product. The correlating serial number will identify a Bill of Materials for each product to allow for visual selection and representation of all parts consumed. When the team is selected, we will provide access accounts to the students as soon as they are on board. This will provide the appropriate access to the systems containing the CAD models and associated bill of materials.
Once the product has been evaluated with all selected options, the team will be responsible for generating a tabulated output of component parts that can be used for capturing uncounted product sub-components not included in physical inventory counts today. As a future step for this project, it would be possible for the existing inventory systems to offer an interface for receiving updates from the A/R system, providing a mechanism for inventory updates in real time.
Solution platform: Ability to build out some type of interface to be able to capture and push inventory items to a platform (Microsoft AX) that can be used to manage quantities. This application will also need to be able to consume updated revision details from the PTC windchill environment (Product Lifecycle Management platform) as drawings will be updated on a continual basis.
For more information on the PTC Windchill environment, it can be found here.
https://www.ptc.com/en/products/windchill
This project will utilize applications and tools from the PTC ecosystem. A link to the starting material is provided below. We currently have several pieces of the data available for consumption and the broader platform has the support of new features such as Vuforia that can be used for A/R development. This platform provides a development library that can be used to build out the associated application that will be utilized on the target HW platform.
SW Application Details : PTC Vuforia - https://developer.vuforia.com
Target Hardware Platform: Apple iPad (Three will be provided for the student team)
Software Application Development – primarily IOS SW application development, needs to be developed as a cross platform development environment.
We anticipate the usage of the following tools in that environment
Atomic Arcade is a AAA game development studio led by industry veterans with experience on some of gaming’s biggest IP. We are currently working on a AAA GI Joe game centered around the legendary ninja / commando character, Snake Eyes. We’re proud to be part of Wizards of the Coast & Hasbro.
Our game design team is working on an idea for Snake Eyes that involves dropping dynamic and randomized perks for our player to pick up during combat gameplay. These perks would augment the gameplay in interesting ways to keep combat fresh and exciting over the course of the game. However, because of the complexity of the system, it is difficult to find design answers (i.e. find the fun) without a working version of the system.
In order to determine if this mechanic would be fun and engaging for players, the team would like to have a focused prototype of the system working and playable. This way, they can see the system in action, experiment and try things, and find answers to some of the design questions that they need.
This means that we would need a minimally functional melee combat system that allows the player to engage in close quarters combat with multiple AI opponents. Once that base system is functional, our designers would like to be able to introduce various perks to the gameplay in order to experiment with. The implementation should allow these perks to augment the system in interesting and unique ways, and will require iteration in order to support the new mechanics that design comes up with.
System requirements:
Initial perks:
Bandwidth is a software company focused on communications. Bandwidth’s platform is behind many of the communications you interact with every day. Calling mom on the way into work? Hopping on a conference call with your team from the beach? Booking a hair appointment via text? Our APIs, built on top of our nationwide network make it easy for our innovative customers to serve up the technology that powers your life.
Bandwidth just moved into a beautiful new campus, not far from PNC/Carter Finley stadium. On our new campus, we have an expansive parking deck, equipped with four EV charging stations on each of the four floors.
Bandmates are very excited about EV charging, but it comes with complications – we have a four hour maximum, and there are other charging “manners” that come into play to be a good charging citizen. It’s very easy to forget and leave your car in a charging station, even after it’s charged, keeping someone else from being able to use that station.
We would like you, dear students, to come up with a way for us to better manage, track, and account for our charging stations.
We envision that a mix of a web application and AI/image recognition would solve this problem. A web app would help us sign up for charging times/stations, and let everyone know who’s using which charger, or if a charger is not in use. Including AI/image recognition would allow us to place cameras (on Raspberry PIs) near each charging spot and develop software to detect if a car is in place or not, identifying the car for display on the web app.
The web application portion can be in any language you prefer. We will host this in Amazon Web Services, so you’d be able to take advantage of AWS technologies (RDS, Lambdas, etc.) if you wish.
The AI/image recognition should probably be done in Python, as there are many good OSS (open source software) libraries available to help. We will provide a Raspberry Pi and camera that can be used to develop + test with.
Overall, we want this to be a fun exercise in developing a combination of a web application and back-end AI/image recognition!
You are free to think up more features and functionality that would make this work well.
And you should also come up with a cool name!
Katabasis is a non-profit organization that specializes in developing educational software for children ages 8-15. Our mission is to facilitate learning, inspire curiosity, and catalyze growth in every member of our community by building a digital learning ecosystem that adapts to the individual, fosters collaboration, and cultivates a mindset of growth and reflection.
There are many communities of students that have limited access to educational materials on computer science, particularly at young ages. Even more problematic, this subject area is often seen as completely unapproachable and indecipherable, and this perception is often ingrained into children at a young age. Katabasis wants to design an intervention for children to break down complex, high-level computer science topics (in this case, machine learning) in a way that makes computer science education more accessible and promotes students’ self-confidence in their capability for computer science.
In addition to teaching computer science, there is a secondary focus of this project to teach students basic taxonomy by having them identify different animal species. This will introduce students to basic biological concepts to supplement their education.
Katabasis is seeking to develop a casual, single-player video game with the intent of teaching young children (10-15) about basic machine learning principles such as classification, decision trees, and supervised/unsupervised learning.
NOTE: There will be NO actual machine learning implemented in this project; the focus is on teaching the basic concepts behind machine learning to a younger audience through a game/minigames representative of machine learning principles.
The story of the game will be analogous to a typical classification problem and will have the player assume the role of a zoologist fighting the evil scientist Dr. Zorb. Dr. Zorb wants to destroy the world’s ecosystem by introducing genetically modified creatures (“Zorblings”) onto the planet. To combat Dr. Zorb and prevent ecological damage, the player has an army of drones that can grab the Zorblings and fly them out into space, but the player will need a quick way to identify Zorblings around the world. To accomplish this, the player needs to develop a machine learning model that predicts and classifies real animals from Zorblings. The player will train their ML model by moving quickly around the world and earning as many points as possible through playing various minigames. These minigames will represent the process of classification and the ML concepts of decision trees and supervised learning. Failure to train the model efficiently will result in a poorly-trained machine learning model that cannot classify a Zorbling from a zebra!
The basic structure of the game will consist of a large game “board” in which players can visit different ecosystems on the globe and encounter minigames of varying difficulty to help “train” the player’s machine learning model. These minigames will be designed around reinforcing and teaching basic machine learning principles through classification tasks. The goal will be for students to understand the logic and reasoning that goes into a machine learning system, without needing to know the technical minutia or comprehensive inner workings. To accomplish better thematic connection with a younger audience, the game will be themed around animals; taxonomical classification has many tasks and activities that can be framed around machine learning concepts. Touchstones for the game include the board game “Pandemic”, Mario Party, Wii Party, and WarioWare. The general structure of a board supported by intervening minigames is a strong one, and appeals to our target audience, so we want to leverage it into educational content.
Because this game overlaps thematically with elementary-level science topics like taxonomy or food webs, the game will primarily be used in the classroom as a supplemental learning tool. As such, there will be time restraints on total gameplay time, as described in the constraints section below.
The team will be asked to implement the “board” portion of the game rather than focus on the individual minigames. The board will consist of five ecosystems, each with its own progress bar to fill. When players choose to launch minigames in one of the ecosystems, they will be directed to a minigame fitting of the current difficulty level. Then, the player will be awarded a number of points, from 1 to 10, that will go towards filling the ecosystem’s progress bar and adjusting the current difficulty level. More detail on these features can be found below, but these are the general systems the team will be responsible for implementing.
Here is a brief summary of the core features we are looking for:
For this project, students will develop their game in Unity for consistency with minigames supplied by the Katabasis team. This will make it easier for students to integrate the existing minigames into their own project.
We are aiming for a low computational load with this project and ask that students keep this in mind when making design decisions and during implementation.
The game should be ~45 minutes in length to fit within the average classroom period. Most of this time will be spent within the minigames themselves, but this time constraint is still important for the team to consider when making design decisions for the board implementation.
The team will receive all of the art assets that they will need for the completion of the project.
Katabasis is a non-profit organization that specializes in developing educational software for children ages 8-15. Our mission is to facilitate learning, inspire curiosity, and catalyze growth in every member of our community by building a digital learning ecosystem that adapts to the individual, fosters collaboration, and cultivates a mindset of growth and reflection.
Educators are constantly looking for new ways to engage their students. More and more, this is presenting itself in the form of educational games.
Some of the more pressing subjects nowadays, especially environmentally speaking, are hydrology and river flows--in particular how they relate to and interact with an ecosystem and natural landscape. Modern events show us that a poor understanding of the way water flows can lead to catastrophic results (as seen in the case of the many dam failures and canal blockages we’ve seen in recent years). Therefore, it is important to foster awareness of these concepts. We are proposing a novel educational tool for students by connecting the engagement of games and the subject of water dynamics in an educational tower defense-style game. Tower defense games have been shown to be very engaging for children, especially in the late elementary school to early middle school demographic where water dynamics are often taught.
We are seeking a team of students to develop a tower defense game based around controlling and altering watersheds to combat pollutants and dangerous flow. The water shall act as the path that the pollutants and debris (the enemies) will flow along, and the player will place plants to act as the towers to combat these foes. The defining mechanic for the game will be the ability to alter the direction the water—and therefore enemies—will travel, allowing for the player to change the dynamics of the level on the fly. A critical touchstone for this game is the mobile game Fieldrunners, where a similar mechanic is implemented.
The path the water takes will be dictated by pre-set factors in any given level in addition to plants the player places down, which will cause the flow to be dynamically recalculated and adjusted. This will be a critical part of development: nailing both the algorithm for water flow and the player’s interaction with this core game mechanic. The flow of the water will also affect the plants themselves; too fast a flow may cause erosion, damaging the plants, and putting the system at risk for a collapse and sudden altering of flow. Not enough flow, however, may leave plants malnourished and reduce their effectiveness.
In summary, the core feature set we are seeking this semester will be:
Students will be required to make a choice of game engine within the first week of the project. They will present their rationale, and we encourage them to consider a variety of factors such as portability, previous experience, and support for game mechanics.
We are aiming for a low computational load with this project and ask that students keep this in mind when making design decisions and during implementation.
Dr. Stallmann is a professor (NCSU-CSC) whose primary research interests include graph algorithms, graph drawing, and algorithm animation. His main contribution to graph algorithm animation has been to make the development of compelling animations accessible to students and researchers.
Galant (Graph algorithm animation tool) is a general-purpose tool for writing animations of graph algorithms. More than 50 algorithms have been implemented using Galant, both for classroom use and for research.
The primary advantage of Galant is the ease of developing new animations using a language that resembles algorithm pseudocode and includes simple function calls to create animation effects.
The most common workflow is
Deployment of the current implementation of Galant requires that a user has git, Apache ant, and runtime access to a Java compiler; it is also complex and includes many unnecessary features. While it is technically platform independent behavior differs on different platforms; any modifications must be tested on Mac, Windows, and Linux.
A prototype web-based Galant, galant-js (https://github.com/mfms-ncsu/galant-js), was developed by a Spring 2023 Senior Design Team. The goal is to incorporate the most important features of Java-based Galant.
Many enhancements are required to put the useability of galant-js on par with the original Java version. The latter has been used heavily in the classroom and in Dr. Stallmann’s research. The JavaScript version already has clear advantages.
Key enhancements are
A detailed list is in feature-requests.md at the root of the repository.
Students would be required to learn and use JavaScript effectively to reimplement Galant functionality. The current JavaScript implementation uses React and Cytoscape for user interaction and graph drawing, respectively. An understanding of these will be required for some of the added features.
2025 | Spring | ||
2024 | Spring | Fall | |
2023 | Spring | Fall | |
2022 | Spring | Fall | |
2021 | Spring | Fall | |
2020 | Spring | Fall | |
2019 | Spring | Fall | |
2018 | Spring | Fall | |
2017 | Spring | Fall | |
2016 | Spring | Fall | |
2015 | Spring | Fall | |
2014 | Spring | Fall | |
2013 | Spring | Fall | |
2012 | Spring | Fall | |
2011 | Spring | Fall | |
2010 | Spring | Fall | |
2009 | Spring | Fall | |
2008 | Spring | Fall | |
2007 | Spring | Fall | Summer |
2006 | Spring | Fall | |
2005 | Spring | Fall | |
2004 | Spring | Fall | Summer |
2003 | Spring | Fall | |
2002 | Spring | Fall | |
2001 | Spring | Fall |