Computer Science

Senior Design Center

Projects – Spring 2019

Click on a project to read its description.

Web App for FHIR: Fast Healthcare Interoperability Resources

Opportunity

There are a lot of savings opportunities for Blue Cross NC specifically around Hierarchical Condition Categories (HCC) Coding, a payment model mandated by the Balanced Budget Act of 1997 (BBA) and implemented by the Centers for Medicare and Medicaid Services. This is used in Risk Adjustment modeling based on two main factors: demographics and health status. BCBS of NC currently leverages the services of a vendor to assist in Risk Adjustment modeling and pays huge fees for a process that has some shortcomings. The higher operating costs, as well as potential transactional errors, are quite burdensome. Any application that can improve member/patient level demographics and diagnoses to determine a Risk Adjustment Factor (RAF) score will enhance our revenue and reduce operational cost. Risk assessment data is based on the diagnosis information pulled from claims and medical records which are collected by physician offices, hospital inpatient visits and in outpatient settings.

Solution

Build a web application that demonstrates the use of FHIR data. Set up a connection to an open Electronic Health Record implementation (there are several industry sandboxes such as SMART HEALTH IT, EPIC, etc.) and demonstrate how access of specific data could enable insights to drive improved health outcomes for populations suffering from specific conditions or illnesses.

Requirements/deliverables

  • Leverage free and open sources to read patient level demographics and diagnoses.
  • Learn about Risk Adjustment Modeling within ACA and Medicare populations.
  • Develop workflows based on realistic business use cases.
  • Learn key workflow automation and optimization concepts and patterns.

Takeaways/Questions to Answer

  • What are the challenges with seeking to design optimized processes?
  • Which set of diagnoses or chronic conditions are major influencers RAF Score?
  • What recommended methods offer the best chance for success?
  • Which tools or approaches provide the easiest/most reliable path to a solution?

Recommend Student Skills

Familiarity with JSONXML, and RDF will be great; clinical and modeling (AI/machine learning – stochastic processes) knowledge is a plus but not required.

Students will be required to sign over IP to Sponsor when team is formed.

Monitoring GitHub History

Abstract

Software development organizations of all kinds want to understand how their software is being worked on, because that allows them to make better decisions.  Information about development activity can identify problems in code or organization before they become schedule problems.

In Senior Design, we want to know if teams are collaborating well so we can help out that team. In industry, we want to know if certain files are becoming problematic to work on and to understand our development behaviors over time.  For open source projects, we want to know who the best outside contributors are and whether certain areas of the code are contributed to more than others.

While there have been some tools to analyze repository data (such as "gitstats"), usually they operate on just one repository at a time.  We want to build an easy-to-use tool that allows easy comparison of multiple repositories and offers still more statistics.

Requirements

  • Create a web application that can be used to monitor contributions to GitHub history.
  • The application will be multi-tenant, to be used by many types of users at once; for instance, an entire NCSU computer science department should be able to use the app. (To avoid making things educational-specific, the idea of "classes" will be called "organizations", and in industry a "department" might be an organization).
  • Multiple organizations can be added to the system, for instance "SeniorDesignSpring2019" could be an organization.
  • Site admins (example: central IT) can create organizations. Organization admins (in our example, professors for a given class) can add repos, initiate rescans, and view the data.
  • The system periodically (and configurably) scans all repos added, and tracked in a database, and presented in a web interface.

Per-user, over time, track and graph:

  • Commits made
  • Average number of files changed
  • Average number of lines added, removed, or deleted

In general:

  • Number of files in a given time period
  • Number of contributors to each file, on average

Per file extension:

  • Number of commits over time

The system should be able to generate both graphical and tabular reports, allowing as many types of comparisons both inside of repositories and outside as possible.

GitHub username/passwords should be securely stored on the database server for each user.  Supporting only https:// access to GitHub (versus SSH) is acceptable.

Constraints/Technology Notes

  • The system will be implemented in Python 3.6 using the Django web framework, and celery for background processing of repository scans.
  • Data should be stored in PostgreSQL
  • Since there is a lot of backend development for the project, the web UI can be kept simple and implemented using django-crispy-forms to keep it simple
  • Basic graphs can be done in D3.js or another JavaScript graphing framework, or explore Python-based graphing libraries such as Plotly.  Finding the best way to do graphs is definitely part of this project.
  • Usage of python data science libraries and exploration is encouraged (scipy, numpy, pandas, etc)
  • The system should not require any cloud service dependencies and should be easy for new developers to learn to work on

Stretch Goals

Going to town on the analytics and exploring everything that is interesting, including source code scanning.  There's really no limits once the basic parts of the project are implemented.

As we intend to use this project for the SDC, and therefore will emphasize exceptionally clean and extensible code throughout the development process, sometimes at the expense of development velocity.

MyMET Calculation Engine

Overview

Duke Energy’s meteorology group uses meteorological and utility data to provide air dispersion and weather-related guidance to the enterprise.  Meteorological data is used for predicting energy usage, wholesale energy trading, and environmental annual reporting. Inputs are gathered from multiple sources, both internal and external to Duke, for analyses and to provide model inputs.   Once gathered, the data undergoes a thorough review to flag anomalies and ensure quality. The NCSU team will build an engine to assist the user in aggregating multiple meteorological data files, perform data validations, calculations, data reformatting, and exporting user-selected variables.  

Requirements

MyMET Calculation Engine should be implemented as a web application.  The application will capture data from multiple sources into a central database.  Sample data files will be provided for input to MyMET. The application should allow for automated upload of the data.  The basic flow is as follows:

  1. Input/access data: The application should upload the data automatically and make it available to users.  Basic filtering and sorting functionality should be provided.
  2. Users would then verify input data using MyMET flagging and graphing capabilities.
  3. Set up and run calculations/verifications:   Users will have the ability to calculate new variables based on the input data and run logic checks against all or a subset of the data.
    1. An example set of equations for some calculated variables are listed in the appendix, for testing MyMET functionality.  
    2. Additionally, users should have access to basic arithmetic operations to create their own formulas.  
    3. Any user-defined formulas should be archivable as templates for repeated future use.
    4. Equality and logic operators (GREATER THAN, LESS THAN, IF, ELSE, etc.) should be available for users to create multiple and concurrent logic checks for data validation on an ad hoc basis.  They should be archivable as templates for repeated use.
    5. Logic checks and validations will be run to flag anomalies in the data.  This can be done on either the input or user-derived data. The tools provided by MyMET should assist users in performing Quality Assurance on the results.
  4. Maintain a record of formulas/checks used in validating the data.
  5. Graph multiple variables in the same plot, using the same or different scales on the Y axis.
  6. Export user-selected (or all) data points to different file formats that can be easily be input into other statistical programs (e.g. R, SAS), as well as ESRI’s ArcGIS software for special analysis and mapping.  Acceptable file formats include Excel, CSV, text, and NetCDF.
  7. As an additional challenge, the application may utilize low-level machine learning to automate data QA/flagging based on historical values and data continuity.

Data

Four basic types of data will be provided, each with multiple files.  Descriptions are as follows:

  1. METower1: Duke Energy Carolinas Meteorological Tower Data (hourly)
  2. METower2: Duke Energy Progress Meteorological Tower Data (hourly)
  3. NWS_ASOS: External meteorological data from the NC State Climate Office web application (“CRONOS”), maintained by the NCSU MEAS department (Marine, Earth, and Atmospheric Sciences).  Data can be pulled for different averaging periods, such as Daily, Hourly, or Monthly.
  4. ENV Data:  This is station-specific, tabulated data in Word, Excel, or PDF formats, including such data as gallons of water discharged, water temperature, and solar radiation measurements, etc.

Technology

The application should be web-based with a relational database.

Documentation

Full documentation is expected.  This includes:

  • Source code documentation
  • A User’s Guide, in the form of a Word Document, fully describing the features of the system.  This is to be a “how to” guide for the users.
  • A Technical Guide, in the form of a Word Document, that describes the architecture and major components of the system from a technical standpoint.  The intended audience will be software developers.

Cloud Hosted User Profile

Problem Background

The Fidelity Plan Sponsor Webstation (PSW) is a web-based application used by Fidelity clients to manage and administer their workplace benefits. Whereas employees of Fidelity clients view their individual benefits through NetBenefits, PSW is the site used by benefits administrators to manage the benefits that are offered to their employees. PSW supports the following workplace benefits: Defined Contribution, Defined Benefit (pension), equity compensation, Health Savings Accounts, health & insurance, and Student Debt Employer Contribution.

PSW was originally developed as a Java web application in 2002-2003. It uses an on-premise relational database for storing user profile data. Rather than leverage web services, our legacy applications connect directly to the database to run stored procedures to create and update a user profile. We view the current database (Sybase) and the architectural design (direct database connections) as legacy technology and we are seeking to modernize. As we move our capabilities to the cloud, we are looking to implement micro-services, which would involve creating a new web service that acts as the interface to the underlying data store. After creation of a web service and underlying data store, our existing applications would be modernized to interface with the profile data through the web service.

Why You Should Be Interested

This project offers the opportunity to work on a cloud hosted solution to a real-world problem. As part of a full stack team working in an Agile fashion, you will gain valuable experience as well as exposure to multiple technologies, which will make you more attractive to potential employers after you graduate.

Requirements

Your project must do the following.

  • Create a cloud-hosted relational data store to store user profile data.
  • Each profile should contain the following: system-generated unique ID, system-generated unique user id, first name, last name, email address, phone number.
    • The system generated unique ID is a numeric number created by the database when a new row is inserted into the table.
    • The user ID is a 7 character alphanumeric string. It should be randomly generated by the service when the create method (POST) is called and must be unique.
    • First name, last name, and email address can be alphanumeric strings, each with a length of 50 characters.
    • The phone number consists of a country code and a 10 digit number.
    • Each profile can have one or more phone numbers and one or more email addresses.
    • Phone numbers can have types of ‘office’ or ‘mobile’
  • Create a REST-ful web service to perform CRUD (create, retrieve, update, delete) operations on the profile.
    • The GET method should retrieve a profile, given the unique ID, and provided that no search criteria are passed. The profile is returned in the body of the response as a JSON structure.
    • When the GET method is called with no unique ID and a query param of userId, the service will search for a profile with the specified user ID.  The profile is returned in the body of the response as a JSON structure.
    • The POST method should create a profile. The profile data will be passed in the body of the request as a JSON structure.
    • The PUT method should update a profile. The profile data will be passed in the body of the request as a JSON structure.
    • The DELETE method should delete a profile, given the unique ID.
  • The service should cache responses using a distributed cache for improved performance.
  • You must be able to demonstrate that create and update operations prime and update the cache, respectively. The delete operation should remove data from the cache.
  • Use SoapUI, Restlet or something similar for testing the web service.
  • Sample use cases: 1) As a web service consumer, I want to retrieve the profile for a given user ID so that I can display it to the end user. 2) As a web service consumer, I want to update the profile for a specific user so that the user's changes can be persisted.

Stretch Goals

While not required, you may choose to augment your project by doing any/all of the following.

  • Scale your service across 2 or more instances with a load balancer spanning across them. Show that the number of instances increases dynamically with an increase in traffic.
  • Create a web page using Angular that will allow a user profile to be created or edited.
  • Connect the service to an Open ID Connect authenticator so that the consumer must pass a credential to the service, which the service must then validate.
  • Show that your solution is scalable and capable of supporting 50 transactions/second (45 read, 5 write).
  • After creating a performance baseline, use a different distributed cache and compare the results to draw conclusions as to which caching technology performs better. For example, if you initially used Elasticache with Redis, try Elasticache with MemcacheD to see how it compares.
  • After creating a performance baseline, experiment with the number of instances of the service and/or database to see how the performance is affected. Is there a point after which adding more instances doesn’t improve performance?

List of technologies to use

The solution should use the following technologies

Environment

  • Amazon Web Services (preferred) or Microsoft Azure

Web Service

  • Java 8 or higher
  • Tomcat 8.5 or higher
  • The response must be JSON
  • SpringBoot recommended, but it's not required
  • Caching: you choose the technology but it should be a distributed cache. Let us know what you chose and why.

Database

  • Must be a relational database (i.e. no NoSQL)
  • You choose the vendor (e.g. MySQL, Postgres, etc.) but be prepared to tell us why you chose it

User Experience

  • Angular

How will this help end users?

This solution will help us move one of our core capabilities to the cloud. Having an insulated micro-service architecture will make the profile more adaptable in the future and will speed adoption by future applications.

Project Collaboration

  • Use Agile for software development
  • Form a scrum team and have 2-week sprints
  • Weekly meeting with Sponsors to review the progress and help answer questions related to requirements

Code Crowd Sourcing

Background

A great amount of innovation happens outside of day to day projects by people who are passionate about identifying problems and coming up with innovative solutions. However, a lot of great ideas go untapped due to factors such as: it is outside the scope of the project, there is no direct funding, or just don’t have a team.

Potential Solution

Build a web application that solves the puzzle where an idea transcends from a simple brain spark to reality. Connect problem solvers who have the passion, the correct skills, and will make the time to help. Create dynamic matching and notification based on persona, domain, skill set, availability, etc.  Create opportunities for associates to showcase and expand their skill sets without affecting their regular work velocity while increasing employee engagement. This application will provide a great opportunity to not just submit cool ideas but also follow the idea through its lifecycle from inception to completion. A typical lifecycle for an Idea would be

  1. Idea/Brain Spark
  2. Skill set requirements
  3. Any Other Prerequisites (Approvals/Funding/Timeline)
  4. Project – State of Active work

For an idea to become a project, you will need:

  • Actual Idea and description
  • A general design of your idea, about how much work it would be, how much time it would take.
  • Knowledge of team composition in terms of skillset and number
  • Any approval you would need. This would include having a second opinion on your idea by a manager or someone that has expertise in the area. This is to make sure the project is actually feasible.
  • Funding if needed

Once an idea becomes a project, it can have these statuses associated with it:  “Looking For Core Team”, “Actively Being Worked On”, “Looking For Volunteer Help”, “Successfully Finished”

Once an idea becomes a project, it will show up on the projects page and will have a status of “Looking for Core Team”. This is when people will be able to view your project and apply to be a part of the Core Team. If people are interested but don’t want to be part of the core team, they can “watch” the project and will be notified of any status changes.

Following Personas can be used for multiple use cases.

  • Intrapreneurs: Submits an Idea
    • want to create efficient solutions to complicated problems
  • Contributors: Comments or Votes on an Idea
    • I am excited about the idea and would like to contribute
  • Seekers: Volunteers for a project can be technologists or Business folks with domain expertise
    • I would like to be part of the team and work on the idea

Additional Considerations

Fidelity already has a Proof-of-Concept UI available, and the team can use it to start their UI Design work

Technologies

Web Technologies:  Use Angular, Java 8 or higher, Tomcat 8.5 or higher

Database: You can choose any database and be able to explain the rationale why team picked that solution

Project Collaboration

  • Use Agile for software development
  • Form a scrum team and have 2-week sprints
  • Weekly meeting with Sponsors to review the progress and help answer questions related to requirements

A Training Game for Capacity Management during New Product Introductions in High-Tech Industries

Background

Our research team, consisting of researchers at NCSU and Intel Corporation, is developing decision support tools to help management understand the issues arising in capacity management during new product introductions. This project seeks to develop a prototype of a role-playing game where managers of different organizations involved in new product introductions can assess the impact of their own decisions on the performance of their organization, that of other organizations, and the firm as a whole.

Problem Statement

The two principal organizational units involved in new product introductions in high tech firms, such as semiconductor manufacturers, are the Manufacturing (MFG) unit and a number of Product Engineering (ENG) units.  Each Product Engineering unit is charged with developing new products for a different market segment, such as microprocessors, memory, mobile etc. The Manufacturing unit receives demand forecasts from the Sales organization and is charged with producing devices to meet demand in a timely manner. The primary constraint on the Manufacturing unit is limited production capacity; no more than a specified number of devices of all sorts can be manufactured in a given month. The Product Engineering units have limited development resources in the form of computing capability (for circuit simulation) and number of skilled engineers to carry out design work. Each of these constraints can, to a first approximation, be expressed as a limited number of hours of each resource available in a given month.

The Product Engineering groups design new products based on requests from their Sales group. The first phase of this process takes place in design space, beginning with transistor layout and culminating in full product simulation. The second phase, post-silicon validation, is initiated by a request to Manufacturing to build a number of hardware prototypes. Once Manufacturing delivers these prototypes, the Engineering group can begin testing. This usually results in bug detection and design repair, followed by a second request to Manufacturing for prototypes of the improved design. Two cycles of prototype testing, bug detection and design repair are usually enough to initiate high-volume production of the new product. Especially complex products or those containing new technology may require more than two cycles.

The Manufacturing and Product Engineering groups are thus mutually dependent. capacity allocated by Manufacturing to prototypes for the Product Engineering groups consumes capacity that could be used for revenue-generating products, reducing short-term revenue. On the other hand, if the development of new products is delayed by lack of access to capacity for prototype fabrication, new products will not complete development on time, leaving the firm without saleable products and vulnerable to competition.

The Product

We seek the development of an educational computer game where students assume the roles of MFG or ENG managers to make resource allocation decisions. The initial module of the game would focus on a single MFG and ENG units. Resource allocation decisions will be made manually, giving the players of the game a feel for the unanticipated effects of seemingly obvious decisions.

The game will have one MFG player and can have multiple ENG players, with each player trying to maximize their own objective function.  We shall assume for sake of exposition one player of each type, and a given number of time periods T in which each player must make its resource allocation decisions.

Data common to all players:

T: number of time periods for which decisions must be made, t = 1,...,T

N: number of products to be considered for both production and development.

The Problem for the MFG Player:

  • The MFG player is given the amount of available factory capacity (total number of units that can be produced in a period) and the demand for each of its available products in each period, the unit sale price of each product, and a set of cost parameters.
  • In each period, MFG must decide how much capacity to allocate to each of its available products, and how much the ENG player can use for prototype fabrication. MFG is charged an inventory holding cost for each unit remaining unsold at the end of each period, and a shortage cost for every unit of demand remaining unmet at the end of each period.

The Problem for the ENG Player

  • The ENG unit must complete the development activities of new products in time for MFG to bring them into production and meet the market demand for that product.
  • An ENG unit may have multiple products in development at any given time.
  • The development of a product consists of a number of stages, each of which involves a number of subtasks
  • The objective of ENG is to maximize the number of products it successfully makes available to MFG over the course of the T periods in the planning horizon.

Thoughts on Game Structure:

Two players (ENG and MFG) take turns, with each turn resulting in decisions for a given period. each player would have access to the state of their world at the start of the current period. For MFG this would be current inventories of each of their products, current backlogs of each product, and demands for the products for the next few periods. For ENG, the state of the world would include their current resource levels, the degree of completion of products currently in development (i.e., which subtasks of which stage have been completed, or are partially completed).

Lots of variations are possible, and we would like to leave the maximum flexibility to enhance the game by adding more sophisticated procedures. Our NSF research is looking at auction procedures that use resource prices obtained from approximate solutions to optimization models by each player - a price-based coordination solution. So, we would ideally like to be able to use this game engine to simulate the solutions we get from our auction procedures and compare them to the solutions obtained by players making decisions manually.

In terms of screens, the most important requirement would be for a screen where each player can see the current state of their world, and evaluate the potential future impacts of different decisions before committing to them. This would require constraint checking and cost evaluation for both players. There should also be some means for the two players to communicate their offers and counteroffers.

A final requirement would be to include a corporate scorecard whose purpose is to assess the impact of the decisions on the firm overall, as opposed to for the individual players. The idea here is to examine situations where a given player’s objective leads it to make decisions that are good for their own short-term objectives but leave the other player in a very difficult position causing the firm to lose money in the longer term.

We expect the project team to educate us about what is possible and what alternative approaches may be taken from the gaming side, and we look forward to working together to come up with an interesting prototype that will allow us to explore this complex but very important problem.

Microsoft Quantum Computing

About Microsoft

For more than 40 years Microsoft has been a world leader in software solutions, driven by the goal of empowering every person and organization to achieve more. They are a world leader in open-source contributions and, despite being one of the most valuable companies in the world, have made philanthropy a cornerstone of their corporate culture.

While primarily known for their software products, Microsoft has delved more and more into hardware development over the years with the release of the Xbox game consoles, HoloLens, Surface Books and laptops, and Azure cloud platform. They are currently undertaking the development of the world’s only scalable quantum computing solution. This revolutionary technology will allow the computation of problems that would take a lifetime to solve on today's most advanced computers, allowing people to find answers to scientific questions previously thought unanswerable.

Description

One of the necessary components of any quantum computing system is a hardware control plane to actually communicate with the quantum bits (qubits). Development of hardware requires the generation and analysis of significant amounts of data, and this control plane is no different. After the logic has been developed, the next step is the synthesis of logic gates in order to analyze timing results. After gate analysis is done, timing reports are generated, and validation occurs to make sure that the values are within specifications. Iterations are performed to improve the results, failures are assigned to users, fixes are put in place to resolve issues, and reports are re-generated. The goal of the project is to improve that flow by simplifying the interaction needed to perform these actions.

Goals

  • Python script to parse and store generated timing reports. Several sample reports will be provided to the team.
  • A web-based dashboard, backed by the Python Flask framework, will be created to show the stored reports and allow the user to select a report to view
  • Each individual timing report will then be viewable, with all appropriate columns extracted from the report. Columns should be sortable and searchable
  • Ability to highlight objects (instance / net names) that have values (skews / delays) outside of boundaries given in the web dashboard
  • Filter (both include and exclude) data based on object-types, as well as names

Stretch Goals

  • Add the capability to differentiate 2 reports and highlight differences in column values
  • Allow the creation of work items in Azure DevOps for timing violations directly from the web page. These should be associated with a path beginpoint and endpoint.

The storage technology that is used (ex. MySQL, NoSQL, filesystem) is to be decided by the development team. Flask must be used as the web framework.

The development team will be provided with multiple sample reports, as well as guidance on what data exists in the reports and what files to extract the data from.

Recommended Student Skills

  • Python required
  • Web design experience would be helpful

Students will be required to sign over IPR to Sponsor when team is formed.

Active IQ Application Awareness

Description

NetApp Storage Systems collect performance and configuration meta-data about system use.  These data describe performance solutions and the workloads that drive those solutions.  This performance meta-data is called AutoSupport and is the information which drives NetApp’s cloud based analytics solution called NetApp Active IQ.  Active IQ system telemetry data is anonymized and aggregated at a customer level. These data are made available in a structured format and indicate multiple features and applications in use on the system.  Structured data in this use case means data that are included in a relational database or data that is well formatted (for example, XML).

Project Goals

The primary goal of this project is to write a program that uses Active IQ data to find groups of applications that normally run together at a given site or locality.  If rules can be identified with high confidence, then the goal is to identify a prioritized list of customers who are not using applications and NetApp solutions in a beneficial way.   A sales strategy could then be developed to reach out to those customers and promote NetApp storage solutions.

The high level steps and deliverables for this project are:

  1. Understand the Active IQ data sets
  2. Plan an approach using the data and analytical capabilities to solve the problem, present findings
  3. Create models with the data to identify application groupings
  4. Apply models to the data set to identify high confidence customers that do not follow the norms seen by most other customers regarding application groupings
  5. Deliver a list of customers that are probably using applications that are not yet on NetApp storage, so that sales strategies can later be defined and acted on to work with these customers.

Requirements

  1. The program should find patterns in the Active IQ data made available that indicate which applications are most often used (running on NetApp Storage) at various customer locations/sites, called a “Group of Applications”. Example: App1, App2, and App3 could be a “Group of Applications ” that are often found to be in use in the same customer manufacturing plants. Also, App4, App9, and App11 are found to be in use in the same customer manufacturing plants which could be another “Group of Applications”.

    To determine this, one approach could be to perform Association Rule Learning (Rule Based Machine Learning) to find groups of applications that typically are run together on NetApp products by customers.  This is a preliminary assessment of the approach and the project could include thinking through other ways to accomplish the task.
  2. The program should represent these findings visually (maps, table, charts) such that business teams can understand the findings.
  3. The program should then identify customer locations/sites that are only using some of the software programs of a “Group of Applications” in the same site, but not all of the applications.  For example: App1 and App3 are in use at these sites. This may be useful to help our sales and marketing teams to have conversations with these customers about the presence (or lack of presence) of App2.  Where App2 may not be in use by the customer but could provide value if they did use it, or perhaps App2 is in use by the customer but it is using non-NetApp storage (competitor storage) and perhaps the customer would like to run App2 on NetApp storage for added value that NetApp can provide. 

Development Environment

  • Open Source Data Science platform / scripting language
  • CSV data files will be shared with the team to work with via DropBox or other online storage

Required Student Skills

  • Proficiency with 1 or more of the following Programming or Scripting languages: R, Perl, Python
  • Knowledge of modeling techniques with structured data
  • Desire to apply data analytics capabilities and present findings in a business environment

SAS Live Stream Speech Recognition System

The goal of this project is to create an automatic speech recognition system that takes a continuous speech stream as input and transcribe the speech in real time. Automatic speech recognition (ASR) is a key technique in human-computer-interaction. An ASR system converts an input speech signal into text, which is further analyzed by downstream natural language processing modules. This technique enables a natural way of communication between human and computers. In this project, students will be participating in the implementation of a Continuous Speech Recognition (CSR) system based on deep neural networks models.

Objectives

This project has the following two objectives with some subtasks as listed below:

  1. Implement a CSR system framework which utilizes SAS speech-to-text and natural language processing technology to transcribe an input speech stream in real time (the audio is “played” from a file/link or spoken and recorded directly) with the following features:
  • User can select a data source (wav format audio file, news podcast link or recorded on-the-fly) from the interface
  • User can click a “Play” button to play the audio data
  • At the same time of playing the audio, speech stream data is transcribed with SAS Speech-to-text technology and transcripts are displayed simultaneously.

 

  1. Improve the performance of the system
  • Record and label speech training data to improve accuracy of the acoustic model.

    The size of the training data set has a significant impact on the performance of a speech recognition system. In this project, each student will be asked to contribute about 10 hours of their voice data as part of training data. We will provide transcripts and recording tools for the student to read. Some helpful API links are:
  • Explore various acoustic model training techniques. Techniques that will be explored in this project include different deep neural network architectures (CNN vs RNN), feature engineering (MFCC vs Spectrogram), system design (phoneme vs character-based architecture)
  • Investigate continuous speech decoding algorithms
  • Research on language model improvement.  This includes the evaluation of the performance of n-gram language models vs. deep neural network-based language models.
  • Stretch goal: Investigate ways to improve model compression and efficiency.

Related Knowledge

To ensure the success of the project, experience with programming language Python is required, knowledge of C/C++/Java are optional. Knowledge and experience with some other techniques are preferred, including automatic speech recognition, deep neural networks, multi-thread programming, and cloud programming.

Dataset

There are some available datasets to train speech recognition models, including WSJ0, WSJ1, LibriSpeech, and SAS Radio data.

TradeTec Offer Sheet Portal

Company Intro

TradeTec is a leading provider of supply chain, inventory and equipment management software for forestry businesses throughout the United States and Canada. The company’s proprietary suite of on-premise and SaaS-based solutions track both hardwood and softwood timber from harvest to production, while monitoring associated costs throughout each stage of the production cycle. With over 30 years of experience in the Forest Products Software Business, the company prides itself on deep industry knowledge leading to continuous innovation and outstanding customer support. TradeTec was founded in 1985 and is headquartered in Winston-Salem, NC.

Background on Perceived Problem

Offer Sheet is a tool that allows salespeople to communicate pricing and/or available log or lumber stock to their prospective customers. Our current implementation is not as user friendly as we would like, involving emailing of actual pdf documents. We want a simpler lighter weight web-based solution which will offer easier access for all customers.

Roles

Salesperson

The salesperson running the application (TW Logs) at the sawmill. This person manages the current inventory and presents offers to the mill’s customers.

End user:  (1 role using the end product)

End user, a person responsible for purchasing wood. This user receives a link to the offer sheet from salesperson which contains a link to the offersheet (via email). The offersheet is an object identified by a URI. When the user clicks on the URI, they are taken to the offer-sheet, containing the offer.

offer sheet usage process flowchart

Solution

A lightweight server will serve up the available offer sheets, and a browser-based frontend will display the selected offer sheet to the customer. The server will be a single page design app developed with 12 factor app design principles hosted adjacently to an existing customer implementation.

Preferred Technology Stack

Backend: node.js / express app / MS SQL server for persistence

Frontend: vue / bootstrap with “flexible design” to support the browser based lightweight delivery.

How will the project help users

The result of this project will streamline the delivery and communication of the offer sheets to their intended users and be accessible to the users with only a web browser from anywhere in the world.

Stretch goals:

  • Filtering of products listed in the offersheet (on multiple dimensions)
  • Expiration of the offersheet
  • Update of an offersheet
  • Salesperson notification or additional actions on the offersheet

 

Students will be required to sign over IP to Sponsor when team is formed.

 

Sample offer sheet generated as PDF:

screenshot of sample offer sheet generated pdf

Maintenance Prediction Mobile App Framework

Founded in Roanoke, VA in 1932, Advance Auto Parts (AAP) is a leading automotive aftermarket parts provider that serves both professional installer and do-it-yourself customers. Our family of companies operate more than 5,100 stores across the United States, Canada, Puerto Rico and the Virgin Islands under four brands, Advance Auto Parts, Carquest Auto Parts, WORLDPAC and Autopart International. Adapting the latest in data and computer science technology, Advance Auto Parts' Center for Insight and Analytics aims to bring machine learning and artificial intelligence to the automotive aftermarket parts market to turbo drive it forward into the digital era.

Project Overview

AAP knows a lot about cars – we have the best parts people in the industry! In addition to providing parts to consumers to address their vehicle maintenance and repair needs, we have numerous resources that can be used to help customers diagnose problems. Bringing the ability to detect and address vehicle problems  

Project Scope

The crux of this project revolves around the design of the major system components and the interfaces between these components. A solid design is critical to the long-term utility of this application even as, for example, initially human-provided inputs are eventually replaced by automated or sensor-driven inputs. The desire is that such future updates would not require a refactoring of the system. A user interface will be required but will be considered secondary to the system design.

Key Software Components

The core functionality of this app will be facilitating the input of updates to vehicle state, the utilization of those state changes to diagnose problems with the vehicle, the prescription of parts and labor needs to address those problems, and a user-facing list of those prescribed fixes.

  • Vehicle State:
    • An accounting of the current and historical state of a specific vehicle.
    • Takes inputs in specific formats to update the state
      • Input formats include
        • Discrete state updates (e.g. mileage, recording repair/maintenance events, etc.)
        • Flow chart (e.g. “I hear” -> “squeaking” -> “when I brake”; see example here)
        • OBD Error Codes from the vehicle’s computer
  • Diagnosis Engine:
    • Draws on the vehicle state to make a diagnosis of vehicle problem
      • Diagnosis Examples
        • Bad O2 Sensor
        • Worn brake pads
        • Engine oil is old/over-used
  • Prescription Engine
    • Given the vehicle state and a diagnosis, prescribes a specific fix
      • Prescription Examples
        • Replace O2 sensor
        • Brake job
        • Change oil
  • Repair/Maintenance Needs List
    • Given the vehicle state and a specific fix prescription, provides user with view of needed repair jobs
    • For each repair job output
      • a list of needed parts
      • instructions for how to complete the job
      • a list of nearby mechanics that can do the job

 

Prototype

We would propose that the bulk of the students’ time and attention be focused on the design of the key components and the interfaces between them. Sample data will be provided to underpin the Diagnosis and Prescription Engines as well as the Repair Job output but implementation beyond a well-designed API stub for each is not necessary.

The key UI elements will give users the ability to enter vehicle information and view the list of prescribed repair/maintenance needs. We encourage the students to think creatively about how to further integrate these prescriptions into other mobile user elements such as calendar and mapping apps.

Pollinator garden planner

Background and problem

Over the past decade, scientists and the public have become increasingly aware of declines in the health of managed honey bees, as well as populations of wild bees such as bumble bees. To help counteract these declines, many programs encourage individuals to install “pollinator gardens” on their properties. These habitats typically draw on lists of recommended, perennial plants known to provide pollen and nectar for bees. Although these habitats can be effective, in our local outreach and extension work, we have noted that garden design remains a barrier for many homeowners who are unsure how to create an aesthetically appealing garden using recommended pollinator plants.

To overcome this barrier, we would like to offer a web-based garden visualization app that lets homeowners map recommended plants into a garden space and view the virtual planting in three dimensions prior to installation. While similar apps do exist (for example, here), they include few if any suitable plants and lack other desirable features.

Solution description

Based on work with the Senior Design Center in Fall 2018, a foundation for the envisioned app now exists. It allows users

  • Create an account and sign in
  • Enter dimensions of garden space to be populated
  • Filter available plants by height, bloom color, bloom season, soil type, shade tolerance, and plant type (vine, shrub, etc.)
  • Drag and drop plants into garden space
  • View resulting garden in layout view, as well as realistic side view, potentially from multiple angles
  • Generate a plant shopping list based on the design
  • Save designs and access them by signing in

Working from this foundation, we would like to add the following functions:

  • Refine the realistic garden side view with perspective (scaling plant images to each other in side view, and to canvas size)
  • Update the view for different seasons, for example, to visualize which flowers will bloom together or which season lacks flowers
  • Refine filtering (e.g. breaking seasons into early spring/late spring and so on)
  • Provide a way to pull up an information box about a plant by clicking on it, to show other information about that plant in the database (what species use the plant, what soil requirements it has, etc.)
  • Implement “check garden” function, which alerts a user when their design does not meet all biological or design recommendations (e.g., your design lacks flowers in July, or all your plants are the same height.)
  • CALS IT has requested that there be regression tests for use in applying security upgrades to the packages used by the application
  • Still need to implement creation and storage of usernames and passwords

Additional considerations

  • The University requires that the app meet WCAG AA accessibility standards
  • We would like the app to work in any browser and be mobile responsive. (Many existing garden design apps don’t run in Chrome, which is frustrating.)
  • We envision an NC-centric app whose source code could be modified by/for Extension educators in other regions to operate based on their own available plant lists. Under what license could the source code be released?
  • When ready for public use, the app will be hosted at NCSU with file storage on the CALS server and a MariaDB database hosted by OIT and managed by Neil McCoy using a cPanel; related technical specifications are available here.

Future development

In time, we would like to add additional features to the app/website. In case it is useful to have these on the table from the start, they include the following functions for users:

  • Optionally (for the truly design-timid user), select a garden style (e.g., English cottage vs. modern) and preferred bloom colors, to produce an automatically populated garden, which they can then modify by dragging/replacing plants
  • Input their own data and observations about rates of insect visits to different plants
  • Upload a background image (e.g., their house) against which the garden could be viewed
  • Add or request their own custom plants
  • Summarize garden attributes--e.g. provide charts/visualizations that summarizes bloom color, timing, shape, and other attributes associated with the check garden feature

Weekly meetings

During weekly sponsor meetings, we will review new developments in the app and provide feedback and clarification related to the goal of making it possible for anyone to design an attractive, bee-friendly garden.

 

Students will be required to sign over IP to Sponsor when team is formed.

Central Test Management Web App

Problem Statement

BlackBerry QNX technology includes QNX Neutrino OS and many middleware components. BlackBerry has decades of experience in powering mission-critical embedded systems in automotive and other industries. As the leader in safety-certified, secure, and reliable software for the automotive industry, BlackBerry currently provides OEMs around the world with state-of-the-art technology to protect hardware, software, applications and end-to-end systems from cyberattacks.

For Self-Driving Cars, functional safety is part of the overall safety of a system or piece of equipment and generally focuses on electronics and related software.  As vehicles become increasingly connected and reliant on software, new threats emerge. Therefore, it is imperative that this software operates safely, even when things go wrong. A self-driving car is an extremely complex system with state-of-the-art technologies. Proving that the system does what it is designed to do is a great challenge. And it must do so in a wide range of situations and weather conditions. This requires a stable, secure and efficient operating system.

To ensure mission critical reliability, BlackBerry QNX continually performs extensive automated testing of their software components by executing hundreds of thousands of tests daily.  The current process of identifying trends, issues and generating charts and reports from these tests consume a tremendous amount of human effort.

Automating this process and generating visual representations of trends and issues would help to identify software issues as early as possible and enable employees to focus on other tasks. 

Project Description

The goal of this project is to create a Central Test Management Web Application, incorporating work from previous Senior Design teams. Developing a Test Management dashboard will be beneficial as a cross-platform solution where developers and testers have a central location where they can access various web based features, such as monitoring software quality trends, creating test lists based on project test requirements and generating test reports that can be easily customized for each test area and user.

The 2019 Spring project is responsible for creating the core Test Management Web Application using code from the 2018 Fall Senior Design project as a base. The team is not responsible for creating the various tools being called by the Test Management Web Application.

The team will be expected to:

  • Create a Central Test Management Web Application that host the dashboard developed by the 2018 Fall Senior Design team
  • Implement User Login and Preferences
  • Create pages to enable editing of document fields based on user roles 
  • Update the existing NodeJS server developed by the 2018 Fall senior design team to:
    • Support the Test Management Web Application features
    • Store and Retrieve data from a MongoDB database

* The client-side application should be capable of running in any modern web browser without requiring additional extensions.   

Skills, Technologies & Constraints

Some prior experience developing web (server-client) applications is strongly recommended for this project.

Members of the team would be expected to have or learn some of the following skills:

  • JavaScript/Node.js
  • Vue, Vuetify, Express.js, etc.
  • JSON
  • HTML
  • REST API
  • MongoDB (No SQL database) - Basic database experience suggested
  • Version Control System (Git)

The client-side dashboard must be written in a language supported by modern web browsers; a modern JavaScript framework is strongly suggested. Any additions or updates to current code base must run on Linux and Windows OS(es).

Support

BlackBerry QNX mentors will provide support at the beginning of the project for a project overview, initial setup, and assistance in the software tool selection.

The students will also be provided with:

  • The NC State 2018 Fall senior design team code base
  • Sample data in a MongoDB database containing data that needs to be charted
  • Demonstration of current proof of concepts developed
  • Wireframes of dashboard interface
  • Guidance on setting up development environment and supporting tools

Mentors will be available to provide support, planning sessions, Q & A, technical discussions and general guidance throughout the project.  Mentors will also be available for meetings on campus as necessary.

About BlackBerry

BlackBerry is an enterprise software and services company focused on securing and managing IoT endpoints. The company does this with BlackBerry Secure, an end-to-end Enterprise of Things platform, comprised of its enterprise communication and collaboration software and safety-certified embedded solutions.

Based in Waterloo, Ontario, BlackBerry was founded in 1984 and operates in North America, Europe, Asia, Australia, Middle East, Latin America and Africa. For more information visit BlackBerry.com 

About QNX

Customers rely on QNX to help build products that enhance their brand characteristics – innovative, high-quality, dependable. Global leaders like Cisco, Delphi, General Electric, Siemens, and Thales have discovered QNX Software Systems gives them the only software platform upon which to build reliable, scalable, and high-performance applications for markets such as telecommunications, automotive, medical instrumentation, automation, security, and more.

QNX software is now embedded in 120 million cars that are on the road today. Automotive OEMs and tier ones use BlackBerry QNX technology in the advanced driver assistance systems, digital instrument clusters, connectivity modules, handsfree systems, and infotainment systems that appear in car brands, including Audi, BMW, Ford, GM, Honda, Hyundai, Jaguar Land Rover, KIA, Maserati, Mercedes-Benz, Porsche, Toyota, and Volkswagen.

Students will be required to sign Non-Disclosure Agreements and sign over IP to BlackBerry when team is formed.

Interactive Tracker - A Service Project

What is Bugle: Bugle is an application and website that enables volunteer event organizers to easily manage volunteer events. Bugle provides a robust suite of project management tools to simplify the unique challenge of organizing a volunteer event. Bugle also helps volunteers find service opportunities within their community. Volunteers can search for events by category, location, and time. Bugle’s services are free for organizations hosting volunteer events as well as volunteers looking for them. Bugle is a non-profit organization committed to making volunteering easier.

Users:

  • Event organizer (event host)
  • Event team leaders (volunteers who are selected to assist with organizing an event)
  • Volunteers

Concept: Interactive tracker is a feature of the Bugle app. During a volunteer event, the interactive tracker allows the event organizer and event team leaders to track the status of an event by annotating each task that has been completed, tasks that are in progress, and tasks that have yet to be started. The interactive tracker enables the event organizer to build an event timeline, input the details associated with completing a task, and assign tasks to event team leaders. This feature will be particularly helpful for larger events like Relay for Life, where an event coordinator is managing hundreds of volunteers and a wide range of resources.

Purpose: The interactive tracker assists in organizing event sequencing, while providing oversight to event leadership.

Functionality:

  • Event organizers can build an event timeline
  • Event organizers can assign tasks to be overseen by event team leaders
  • Event team leaders can add tasks for themselves to complete
  • Event organizers and event team leaders can view all event tasks and their statuses
  • Event team leaders will receive push notifications to start respective tasks that are contingent upon other tasks being completed

Technology: Mobile-app developed for Android and iOS written in React Native.

Students will be required to sign over IPR to Sponsor when team is formed.

Machine Learning Market Predictions

The Assignment

Banks are actively looking for opportunities to leverage emerging technologies in the AI/ML space. The ability to understand how various factors influence the evolution of specific markets and model such outcomes in a predictive fashion is particularly intriguing, both academically and practically. While this problem can be generalized across various asset classes (i.e. forex, commodities, energy, etc…), students should identify a particular area of focus.

Please review the following high-level specifications and requirements. The below will be refined in greater detail during scoping with the team.

  1. Analysis:
    • Select asset class and specific assets for the project (we recommend choosing assets from multiple market cap segments, i.e. a 2-3 high cap, 2-3 mid cap, and 2-3 low cap). Asset classes and specific assets should be chosen based on data availability in the public
    • Identify data sources (i.e. historic spot prices, trading blogs, Facebook posts, etc…)
  2. Prepare Data Sources:
    • Extract data
    • Aggregate data into a data store
  3. Develop Predictive Model:
    • Determine approaches to analyze how different data influences spot price evolution
    • Understand how data feeds into predictive model
  4. Prepare and Present Results:
    • Quantify how strong a signal can be extracted, and over what lead time periods we can start to make statistically useful predictions about spot price changes
    • Display predictions on an interactive thin client dashboard

The Spring 2019 Semester

Students will use their course knowledge, creativity, and technical expertise to delve deep into the exciting world of AI/ML. DBGT will provide thought leadership, direction, guidelines, technical expertise (as needed) and support in refining the project scope. The DBGT team is here to aid and guide the student workflow, processes, and design ideas.  We support and encourage the students’ input on this project.

The NCSU Student Experience

Senior Design students in the College of Engineering Department of Computer Science will have a unique opportunity to partner together over the course of the semester to explore the exciting and developing field of AI/ML with direct application to a real business problem. Additionally, students will have access to industry professionals to assist in the software design, agile practices, and the overall code development and testing. Students will be allowed to share the final product as part of their own portfolio while job seeking.

Predicting Root Cause System Failures with Machine Learning and NLP

Siemens Healthineers

Siemens Healthineers develops innovations that support better patient outcomes with greater efficiencies, giving providers the confidence they need to meet the clinical, operational and financial challenges of a changing healthcare landscape. As a global leader in medical imaging, laboratory diagnostics, and healthcare information technology, we have a keen understanding of the entire patient care continuum—from prevention and early detection to diagnosis and treatment.

At Siemens Healthineers, our purpose is to enable healthcare providers to increase value by empowering them on their journey towards expanding precision medicine, transforming care delivery, and improving patient experience, all enabled by digitalizing healthcare. An estimated 5 million patients globally benefit every day from our innovative technologies and services in the areas of diagnostic and therapeutic imaging, laboratory diagnostics and molecular medicine, as well as digital health and enterprise services. We are a leading medical technology company with over 170 years of experience and 18,000 patents globally. Through the dedication of more than 48,000 colleagues in over 70 countries, we will continue to innovate and shape the future of healthcare.

Issue Diagnosis and Root Cause

Preclarification is a remote based “troubleshooting” service Siemens offers its customers to diagnose and resolve reported equipment issues without the need for “on site” support.  When an issue is reported, Siemens Healthineers creates a ticket known as a Notification. The Remote Services Center utilizes remote capabilities (established through a broadband internet based connection to either a customer-owned or Siemens provided secure end-point) to log directly into the customer’s system and investigate the issue.  During Preclarification we will either resolve the Notification or develop a detailed action plan for our Customer Service Engineers to follow once they arrive onsite. This action plan will provide direct and concise technical repair recommendations (including potential parts replacement requirements) to reduce on site repair time.

Root Cause is determined and documented at the conclusion of each Notification by either the Remote Services Center or the Customer Service Engineer.  This is critical since the issue reported and the actual root cause of the problem can differ greatly. Root Cause is captured in a database in the following three fields: Cause Code Group, Cause Code, and Cause Code Text.  Collectively, these fields provide visibility to the product module affected and the work performed to correct it.

Project Overview

Accurately diagnosing customer reported issues has become increasingly challenging due to a diverse product portfolio and overall system complexity.  Our objective is to improve Preclarification performance, drive Remote resolution rates, reduce onsite repair time, and most importantly increase customer workforce productivity and efficiency.  In doing so, we will optimize instrument performance and reduce system downtime, which will greatly improve patient care. In order to achieve this we are seeking support in the area of predictive analytics.  Specifically, our goal is to trend customer reported issues and predetermine root cause using documented data compiled from our customers, our Remote Services Center, and our Customer Service Engineers. With over 1,000 products and 500,000 potential areas of root cause, we are seeking your support in transforming how we predict and respond to our customers’ needs.

Project Scope

This project will be broken out into three distinct phases:

Phase I – Database Creation and Language Processing

The first step is to join together the notification data with the root cause data in a database.  This may require some preprocessing (e.g. tokenization, normalization) to prepare the data for exploration and experimentation in phase II.  The choice of which tools and methodology you employ is up to you.

Phase II – Implement Machine Learning

After the database of notification data and root cause data is created, the next step will be to use machine learning to predict the root cause of a given notification. We are flexible with how you solve this problem as long as all methods are clearly documented and explained.

We realize that as computer science undergraduates you might not have had a lot of exposure to natural language processing and machine learning.  But if you are a student with some interest or experience in these areas this could be an excellent opportunity to combine your skills and talent to accomplish something very impactful.  Because of the scope for this phase, Siemens is committed to pointing you in the right direction with your researching and working closely with you to ensure this project is solvable and successful.  

Phase III – User Interface

Once you have created a satisfactory machine learning model, the final step is to build an intuitive user interface. This interface will let non-technical users classify the root cause of new notification data using the model you developed in phase II.

 

NOTE: Students will be required to sign Non-Disclosure Agreements and sign over IP to Siemens Healthineers when team is formed.

 

Notification Data Sample

This dataset will contain text descriptions of the issues for each notification.  The additional fields in the dataset will be used to classify and group similar machines for machine learning.

list of display service notifications

Root Cause Data Sample

Each notification will have a cause code (with a corresponding text value) that represents the root cause.  In this example, the root cause is HW67100 A051.

root cause data sample

 

“SIEM City” - Gamifying Threat Detection for Security Information and Event Management

Overview

With highly skilled cybersecurity experts being in high demand, we are struggling to hire cybersecurity experts who are experienced and knowledgeable in finding security issues. Visualizing data to identify security-related anomalies could help increase efficiency and fun in the world of threat detection.

Game Concept

Using a game interface along with algorithmic artificial intelligence, developers would create an engine capable of generating an abstracted interface (world) that allows analysts to identify cyber threats.

Leveraging artificial intelligence processing techniques, assets and security events should be visualized in a logical, engaging way. A user should be able to see an asset and events relevant to the asset. Assets should represent a computer (physical or virtual), and events should be pulled from data sources that will be provided.

Acceptance Criteria

  • Build the Game interface/Visualization
  • Interface with an enterprise data lake to manually and dynamically build and associate assets, asset details (such as owner, relevant applications, etc…), and events (such as phishing attack, denial of service, etc…)
  • Have fun!

Stretch Goals

  • Successfully flag one or two simulated cyber attacks
  • Build and apply additional simulations (threat models) to “SIEM City”
  • Explore Quantum Computing implications to this model

BB&T, through a partnership with Securonix, will provide a cloud data lake and can simulate various threats.

Preferred Technologies

  • If the game is a web-based application, the application must work with Chrome 67; any language; any engine (though open-source is preferred)
  • If the game is a standalone application, the application must run on Windows 7 (without requiring admin); any language (Java or Python preferred); any engine (though open-source is preferred)
  • Android or iOS applications are not acceptable for this project

 

Students may be asked to sign over IP to sponsor when team is formed.

Performance Study of Quantum-Secure Signature Algorithms in TLS

Background

In this project, the team will have a chance to work on a leading-edge research area of cryptography: Quantum resistance cryptography. A quantum computer (QC) could break essentially all of the public key cryptography standards in use today: ECC, RSA, DSA and DH. Thus, if practical QCs became a reality, they would pose a threat to PKI-based signatures. For example, someone with a QC at some time in the future could impersonate a bank’s website by guessing the bank’s private key that corresponds to its public certificate. For those reasons, the National Institute of Technology (NIST) has started a PQ Crypto project in order to pick the quantum-secure algorithms for the future.

Quantum-secure algorithms do not come without a cost. They are more CPU intensive and introduce more communication overhead. Thus, today’s protocols cannot use them out of the shelf. Cisco Systems has been focusing on quantum secure signatures and their impact on common communication protocols like TLS, IKEv2 and SSH. Other big vendors like Google and Cloudfare have been looking into key exchange algorithms in TLS. Microsoft has also focused on key exchange and VPNs. Quantum-resistant crypto is a topic that has gotten great attention in academia, standards bodies (IETF, ETSI), government organizations and the industry.

Description

In this senior design project, we would like the team to build a test application that enables us to benchmark the performance of TLS connections. Metrics we are mostly interested in are:

  • time to complete the TLS handshake,
  • hop count between client and server,
  • time spent signing and verifying, and
  • time spent transmitting.

We would like to be able to run these benchmarks regardless of the signature algorithm that is used behind the scenes in the TLS handshake. For example, we would run it in plain TLS 1.3 with traditional signature algorithms. We would then run it by using picnic signatures and finally by using hybrid p256+picnic. That way, as more postquantum signature algorithms are implemented, we would be able to compare TLS performance in the future.

The open-source libraries we will use are liboqs and oqs_openssl:

  • liboqs is an open-source library maintained by the University of Waterloo that incorporates implementations for some of the PQ algorithms in the NIST competition. liboqs currently only supports picnic and qTesla signature algorithms.
  • oqs_openssl is a fork of OpenSSL that includes quantum-resistant algorithms and ciphersuites based on liboqs. It is also maintained by the University of Waterloo. oqs_openssl has implemented postquantum signatures in certificates and TLS 1.3 for these algorithms (picnic, qTesla). They have also enabled hybrid signatures that incorporate both traditional and postquantum algorithms (p256_picnicL1FS, p384_qteslaIIIspeed) in certificates and TLS 1.3.

Stretch Goal

As a stretch goal, the team would integrate Falcon, one of the best PQ signature candidates, in oqs_openssl. There is an open-source implementation of Falcon already in https://falcon-sign.info.

Required Student Skills

Students must have

  • experience in C programming and Linux.
  • taken at least one course in security.
  • some familiarity with cryptography and the TLS protocol (as shown in https://tls.ulfheim.net/ and https://tls13.ulfheim.net/).
  • some familiarity with OpenSSL.
  • motivation to work on an interesting topic and an opportunity to make an impact.

Deliverables

  1. A test implementation that benchmarks the performance of a TLS handshake regardless of the signature algorithm used.
  2. A report that includes analysis and graphs of the benchmarked performance metrics for traditional algorithms, picnic, and qTesla.
  3. As a stretch goal: integrate Falcon, one of the best PQ signature candidates in oqs_openssl. That would require to add Falcon in the libqos sigs and add it in oqs_openssl as this PR does for picnic.

 

Enhancing the Browser-Based Mobile Experience in Engineering Service Management

Background

Dell EMC follows a centralized lab approach for many of our Product Engineering teams.  We have large shared labs in strategic locations across the globe where most of the equipment necessary to design and develop new products is housed. Since our Engineering teams are not co-located with their equipment, we have Operations teams in place at each of our strategic lab locations to provide assistance with asset management, maintenance, upgrades, and support. These Operations teams are part of the larger IEO (Infrastructure, Engineering, and Operations) organization within Dell EMC. The IEO team utilizes an application called ESM (Engineering Service Management) to manage assets, requests, and incidents for all of the labs we support.  This application is built on top of the ServiceNow platform.

The focus of this project is to enhance and expand the browser-based mobile support we currently have in ESM for both our End User Portal as well as the Process User View where our Operations teams manage their day-to-day activities. This will require direct ServiceNow development utilizing JavaScript, AngularJS, jQuery, JSON, and many other technologies where necessary.

Project Description

The project consists of the following components:

  1. Enhance the browser-based mobile experience for our End User Portal.  The team will review our existing mobile browser-based End User Portal experience and work through the creation of a proposal for enhancements. The team will create an enhanced user experience in navigating the site, creating new incidents/requests, and monitoring status of existing incidents/requests.
  2. Enhance the browser-based mobile experience for our Process User View. The team will review our existing mobile browser-based Process User View and work through the creation of a proposal for enhancements. The team will create an enhanced user experience in navigating the site, taking ownership and working new incidents/requests, communicating with customers, and maintaining asset information along with DCIM based equipment location information.
  3. (Stretch Goal) Enhance usage of a mobile device (camera, location info, etc.). The team will have an opportunity to be creative in the usage of a mobile device, perhaps including the usage of the camera for images or barcode scanning, location information, etc.  

Before the project begins, the team will be granted access to one of our ESM lower level environments for full usage throughout the semester.  This lower environment will be a clone of our Production environment.

Company Background

Part of Dell Technologies, Dell EMC is the world's leading developer and provider of information infrastructure technology and solutions. We help organizations of every size around the world keep their most essential digital information protected, secure, and continuously available.  

We help enterprises of all sizes manage their growing volumes of information—from creation to disposal—according to its changing value to the business through big data analysis tools, information lifecycle management (ILM) strategies, and data protection solutions. We combine our best-of-breed platforms, software, and services into high-value, low-risk information infrastructure solutions that help organizations maximize the value of their information assets, improve service levels, lower costs, react quickly to change, achieve compliance with regulations, protect information from loss and unauthorized access, and manage, analyze, and automate more of their overall infrastructure. These solutions integrate networked storage technologies, storage systems, analytics engines, software, and services.

Dell EMC's mission is to help organizations of all sizes get the most value from their information and their relationships with our company.

Hyperledger Business Blockchain Framework Creation

Background

With the development of distributed ledger technology now ten years in the making, enterprises are slowly exploring the possibilities of blockchain. While most of the blockchain work is happening on innovative solutions focused on public and/or permissionless blockchains, enterprises will predominantly use private and/or permissioned blockchains.

Public vs Private Blockchain? The most important difference between a public and a private blockchain is that within a public blockchain, the actors involved in the network are not known, while in a private blockchain they are.

  • Within a public blockchain, trust within the system is created through game-theory incentives and cryptography. Anyone can join a particular public blockchain, simply by connecting their computer to the decentralized network, downloading the application, and starting to process transactions. It is not required to have a previous relationship with the ledger, and you do not need to be approved to join.
  • Private blockchains, however, do not require such artificial incentives since all actors in the network are known to each other. New actors that want to join the network have to be approved by existing participants in the network. This enables more flexibility and efficiency of validating transactions.

Private Blockchains are generally used by organizations that like to keep a shared ledger for settlement of transactions. They are owned and operated by a group of organizations, and transactions are visible only to members of the network. With the technology advancing and more organizations seeing the benefits of a shared ledger among industry partners, more enterprise blockchain solutions are being developed.

Hyperledger - Blockchain Technologies for Business. Hyperledger is an open source collaborative effort created to advance cross-industry blockchain technologies. It is a global collaboration, hosted by The Linux Foundation, including leaders in finance, banking, IoT, supply chain, manufacturing and technology. https://www.hyperledger.org/

Project Description

For this project, the team will:

  1. Create a Hyperledger Fabric setup (Hyperledger Fabric is a blockchain framework implementation) that KPIT can further develop and create industry-specific use cases (in addition to the use cases in the second bullet below).  The project should be able to demonstrate core components used from Hyperledger toolkits and a successful business flow involving at least 3 parties. All code developed should be modular and easily updatable to meet the needs of multiple use cases.
  2. Use the Hyperledger Fabric setup to demonstrate the following business use cases
    1. Asset management (parts tracking & manufacturing history).
    2. Product genuinity (pharmaceuticals).

 

 

Key focus areas for a BirlaSoft Hyperledger Business Blockchain include:

  • Permissioned Network - Establish decentralized trust in a network of known participants rather than a public network with no identity
  • Confidential transaction - Expose only the data you want to share to the parties you want to share it with
  • Pluggable Architecture - Tailor the blockchain to industry needs with a pluggable architecture rather than a one size fits all approach
  • Easy to get started with Reusable components, starter applications.

 

Students will be asked to sign over IP to sponsor when team is formed.

Knowledge Is Power

Knowledge is power. News is not.

Sally is a powerful partner of a nationally relevant law firm. She heads a practice group prominent for their work in high-profile issues in food, product and chemical safety. Her day begins very early with a workout before heading into the office, stopping at 6 pm for dinner and family, and tries to get in a half-hour of reading before truly ending the day.

She does not have much time for general news channels, where less than 0.1% of articles are relevant. One negative about general news reporting is its time-backward perspective: they are reporting on the start of a court case when the attorneys have already been selected.

Time-forward knowledge is relevant. Trends, innovations, and risks in her and her clients’ spaces are very relevant. Her reading focuses on weekly professional journals and news summaries where 50% of the articles are seemingly relevant (think about equivalents to ACM’s XRDS).

To efficiently consume professional and business news, she has her approach. She discards obviously irrelevant sections and scans the remaining articles’ first paragraphs to select the ones she will read in full. Still, she finds herself skipping out of 80% of the articles.

To respect her valuable time, we introduce PowerKnowledge. PowerKnowledge takes a large feed of multiple sources of business and industry news, and adaptively selects the 3 ranked articles that Sally should read that day. It uses knowledge of her current interests, the diversity of articles presented to her within the past week, and unobtrusively captured feed-back. Her current interests could be based on industry, product category, manufacturer or specific matter.

When PowerKnowledge sells well, in the follow-up version, we will consider incorporating knowledge of behaviors from similar people. Be mindful that this problem is different from the online shopping recommendation engine. It is the dis-similarity that gives Sally’s law firm the competitive edge.

For this project

This product is currently troubled by the cold-start problem. The cold-start problem occurs when a machine learning product has zero users hence insufficient labeled training/testing data.

This project is to overcome the cold-start problem. To overcome the cold-start problem, the initial labeled training/testing data will be collected from roughly 20 to 50 test users in a more obtrusive manner over a quarter-year.

Our jump-off point is an existing simple web-application. This web-app prototyped the product concept. The required modifications are:

  • user authentication,
  • news feed connection,
  • user interface to explicitly grade up 30 articles, and
  • other enhancements to build an initial labeled training/testing dataset.

Data Sources

The data source is a LexisNexis business news feed.

Technology

The technology is Python, JavaScript and MySQL. The web-app framework is Django.

 

Mobile Health App for Wearable Sensors

An asthmatic high school student, diagnosed with asthma in early childhood by her pediatrician, uses a control inhaler to help her manage her symptoms and a rescue inhaler when necessary. She takes the bus to school every morning and engages in extracurricular activities, including outdoor sports, each afternoon. Her symptoms may correlate with increased physical activity but the timing and severity of these symptoms are difficult for her and her family to track.

The student wears a health monitoring device that detects heart rate (R-R peaks) and environmental exposure to O3 and volatile organic compounds (VOC).  The student’s mobile health application should provide feedback to the student to influence behavioral changes if there is a potential health risk.

Background

Smart and self-powered sensing nodes can have disruptive impact in healthcare and the Internet of Things (IoT). Autonomous self-powering leads to “always on” operation that can enable vigilant and long term monitoring of multiple health and environmental parameters. When packaged in wearable, comfortable, and hassle-free platforms, these systems increase adoption by users and can be worn to gather information over long periods of time and reveal possible correlation or even causality between different sensor streams. This information can be powerful in chronic disease management such as heart disease, asthma, and diabetes.

Similarly, in IoT applications, always-on, battery free operation of smart sensing nodes can lead to low maintenance structural monitoring of buildings, cities, and infrastructure along with large scale smart agricultural or industrial monitoring applications. The NSF Center on Advanced Self-Powered Systems of Integrated Sensors and Technologies (ASSIST) is building disruptive self-powered smart sensing nodes with state-of-the-art energy harvesting technologies, high-power/high-energy density supercapacitors, ultra low-power electronics, and low power health and environmental sensors all integrated into comfortable wearable platforms that work together to achieve “always on” capability).

Description

The ASSIST use cases focus on measuring heart rate (HR)/heart rate variability (HRV) and environmental air quality to alert users of potential health risks. An app-based user interface should support the ASSIST use cases to provide user feedback and influence behavioral changes based on the data.  

ASSIST wearables use Bluetooth low energy (BLE) for data serialization and export to the app. In particular, these tasks require the development of mobile health apps, which can synthesize data from multiple sensors, process the sensor data based on known correlations from offline analysis, and provide an engaging user interface. The app should connect via BLE and allow users to dynamically visualize the HR, HRV, and environmental exposure (e.g. gas concentrations). In addition to data aggregation, the device should have on-device alerts to high HR, HRV, and exposure.

Some specifications for the application include:

  • two-way communication with embedded device via Bluetooth;
  • receive inertial (200 Hz), EKG (200 Hz), environmental (20 Hz), and device status (1 Hz) data streams at the appropriate rates;
  • store all data in local storage;
  • extract features and summaries from data streams (e.g., HR and HRV, hourly summaries);
  • display incoming streams;
  • transfer files to a cloud-based database and streams summaries of the data to the cloud via Wi-Fi.

Required Technologies

  • Android, Java
  • IBM Cloud

The development team will use a cloud service to host the data. This decision is affected by expected access permissions, security/encryption methods, and regulatory requirements. As an option, the team will also research and outline the requirements for being HIPAA compliant in both privacy and security. The data storage implementation to be managed by the app development team should take into account data security while also making it accessible to ASSIST members and corporate affiliates with the proper access permissions.

Supplier Inventory Management System for Governance and Risk Management

When a business need is identified, Merck sends a request for information (RFI) to potential suppliers who may be able to provide a product or service to fulfill the business need. The potential suppliers send responses, which are reviewed by Merck. Once a supplier is selected, a contract is negotiated and signed by Merck procurement. During this time, the supplier is subjected to multiple risk assessments to continuously gauge the business relationship between the supplier and Merck. For example, a supplier could be the source of a security or privacy breach, or a supplier may otherwise not fulfill contractual expectations required for Merck’s success. Merck needs a system to capture information used to assess risk, including data such as:

  • contract information,
  • questionnaire responses (provided by the suppliers),
  • any ongoing governance level issues, and
  • termination information for the suppliers.

The team will design and build a software system to manage a master inventory of suppliers. The supplier management system will become the ‘single source of truth’ about supplier relationships and will capture information necessary to provide governance and manage risks. For example, the system will store contracts, contact information, product or service descriptions, financial expenses, etc. Without a system to manage supplier information, governance and risk management is not holistic.

The system should have a database of suppliers to capture supplier information based on the identified requirements (examples are provided above). The system also needs to have a user-friendly front-end that provides a dashboard view of risks and suppliers with the ability for a Merck employee to generate reports. Users of the system can include both Merck internal users that manage supplier risks, as well as external suppliers who will use the system to provide information to Merck.

Tasks

  • Collect and define functional requirements - what are the right attributes and data to collect for the inventory, which reports should be produced, what information must be provided in a dashboard view of risks and suppliers,
  • identify necessary integrations with other systems
  • Build a software system
  • Build an API specification for other systems to integrate with (microservices architecture preferred)

 

Protein Dynamics: Perceiving the Intangible with Augmented Reality - A Service Project

Sponsor

Wake Tech Community College, Life Sciences Department, Dr. Candice Roberts

Wake Technical Community College is the largest community college in North Carolina, with annual enrollments exceeding 70,000 students. The pre-nursing program in the Life Sciences Department runs a two-course series on Anatomy and Physiology, where this project will be used, with enrollments exceeding 800 annually. Additionally, this project is expected to assist over 1,000 biology students when fully implemented.

Background

Biology and pre-nursing students need to understand how the body carries out and controls processes. Proteins have a diverse set of jobs inside cells of the body including enzymatic, signaling, transport and structural roles. Each specific protein in the body has a particular function and that function depends on its 3D conformation.  It makes sense then, that to alter the activities within cell or body, proteins change shape to change function. As a beginning biology or pre-nursing student, this is a difficult process to imagine from a 2D image in the textbook, and we wish to create a tool that helps visualize protein dynamics. One important example of this is hemoglobin. Hemoglobin is a huge protein found inside red blood cells and its primary function is to carry oxygen and carbon dioxide to and from cells of the body, respectively. Structures inside hemoglobin bind to oxygen dynamically at the lungs and then releases the oxygen at metabolically active tissues.

Primary Problem

Last semester, students in the senior design course created an AR app that allowed students to view the structure of hemoglobin, under various conditions. This semester we want to expand this technology from a tool that permits visualization of a single protein to a tool that’s usable in the classroom and permits instructors to guide their students’ exploration of the concepts of protein dynamics. The main functionality of the software for visualizing protein structure under different conditions exists, but making it usable for instructors and students as a learning tool requires additional design and development to enable biology instructors to populate the backend database with protein structures, tailor visualizations of those structures to the learning goals, and add additional instructional content.

Technology and Technical Requirements

Building on the framework developed during the prior senior design project, this revision must improve upon existing functionality as well as provide new functionality.

Functionality improvements:

  1. Support additional AR tags to enable collaborative annotation of macromolecular substructures. Specifically, users should be able to
    1. select from a set of tags associated with colors;
    2. select from a set of tags associated with structure names; and
    3. place a color and structure name tag in view of the camera in addition to the molecule tag and environmental factor tag simultaneously (for a total of four active tags)
  2. Improve stability of tag detection to enable 360-degree rotation
  3. Allow for more types of visualization: space filling vs ribbon

New functionality:

  1. Provide a web interface for instructors to specify
    1. models of macromolecular structures in different 3D conformations
    2. environmental factors that trigger transitions between 3D conformations
    3. tasks for students to perform and the conditions that indicate completion of those tasks
    4. multiple choice questions for students to answer
  2. Provide support for rendering task instructions in the 3D scene or within the UI
  3. Provide support for rendering multiple choice questions within the 3D scene or within the UI, and support for users selecting answers to those questions
  4. Provide support tracking answer correctness over a series of questions during a session

Senior design students will also work with Wake Tech Community College Biology instructors to conduct usability testing at two or three sessions during the semester, which will require transportation to Wake Tech’s Perry Health Science Campus in Raleigh.

Technologies that are Currently Used

  • AR.js
  • three.js
  • LiteMol
  • Spring Framework
  • MySQL

 

Students will be required to grant royalty-free use of IP to Sponsor when the team is formed.

Call Recording Analysis

The Problem

We are looking for a team that wants to experience the whole life cycle of software development all while working to solve real-world business needs - AI, cutting edge front end, robust backend deployed in containers using Docker and Kubernetes.

The voice services that Bandwidth offers to its customers are world class and very useful for communication purposes.  However, wouldn’t it be great to have a further understanding of what happened on a call without listening to each and every call a company makes?  Think of a call center that makes thousands of calls a day. The intent of this project would be to analyze the content and sentiment of a voice call and determine its makeup.

The Solution

Bandwidth would like a tool to analyze call recordings to determine their content and sentiment.  The project will involve the following:

  • Develop an application to ingest audio recordings and metadata through a REST API.
  • Convert these audio files into necessary text.
  • Analyze and log the results of the sentiment analysis.
  • The application should leverage existing tools and APIs where appropriate - for example, speech-to-text engines, sentiment analysis AI, etc.
  • The back-end API will be implemented in Java.
  • Since real recordings are confidential, we are unable to provide these.  Part of the project will involve developing a set of audio recordings (or as time allows using Bandwidth’s Catapult API to make and record test calls).

Possible stretch goals:

  • Analysis by multiple sentiment engines (compare or combine results for best results)
  • Visualization of the results through a React front end.

About Bandwidth

At Bandwidth, we power voice, messaging, and 9-1-1 solutions that transform EVERY industry.

Since 1999 our belief in what should be has shaped our business. It’s the driving force behind our product roadmaps and every big idea. It’s a past and a future built on an unrelenting belief that there is a better way. No matter what kind of infrastructure you have (or don’t have) we’ve got your back with robust communications services, powered by the Bandwidth network.

We own the APIs… AND the network. Our nationwide all-IP voice network is custom-built to support the apps and experiences that make a real difference in the way we communicate every day. Your communications app is only as strong as the communications network that backs it, and with Bandwidth you have the control and scalability to ensure that your customers get exactly what they need, even as your business grows.

 

Read Optimization Using Learning Profiles

The Dell/EMC senior design project will give the team a chance to develop software that improves the real-world performance of Dell/EMC backup and restore software.  The team will identify optimal strategies for handling different types of backup / restore loads, then apply those strategies to new applications in order to automatically improve their performance.

Background

Data Domain is the brand name of a line of backup products from Dell/EMC that provide fast, reliable and space-efficient online backup of files, file systems and databases ranging in size up to terabytes of data. These products provide network-based access for saving, replicating and restoring data via a variety of network protocols (CIFS, NFS, OST). Using advanced compression and data de-duplication technology, gigabytes of data can be backed up to a Data Domain server in just a few minutes and reduced in size by a factor of ten to thirty or more.

Our RTP Software Development Center develops a wide range of software for performing backups to and restoring data from Data Domain systems, including the Data Domain Boost libraries used by application software to perform complete, partial, and incremental backups and restores.

As Data Domain makes its way into even more data centers, the need to accommodate additional workloads is increased. Customers must be able to backup their data efficiently to meet constantly decreasing backup time periods.

This concept or requirement also applies to the restoring of the data/databases. Dell/EMC has developed technology to increase the efficiency and reduce the time for data to be backed up and for the data to be protected.

The focus of this project is to determine the optimum behavior of the Data Domain software for several data restore access patterns. Depending on the behavior of the application performing the data restore, we need to determine the optimum settings for several parameters that will modify the behavior of our internal software library.

Students will use predefined access methods to restore data files from a Data Domain Virtual System and, based on the time / throughput of the restore, modify one or more parameters to decrease the time of the restore, improve the throughput of the restore, or both.

These parameters (collectively called a profile) will be used by the Boost software to optimize the restore process.

Project Scope

We want to determine the optimum settings (or parameters) of a workflow profile for different access patterns. For the project we are defining 3 specific access patterns which can be extended to include additional patterns if time and resources permit.

  1. Sequential – data is read from the beginning to the end sequentially.
  2. Random – all reads are random
  3. Sequential w/random “hops” – Most reads are sequential with a small percentage random

These are the 3 typical access patterns we want to investigate and optimize. For each access pattern, the team will use the supplied tools and software to restore a database and, based on the results, modify one or more of the profile parameters to reduce the time of the restore.

The results for each individual test can be stored on the Data Domain using an internal database and be data-mined to help identify the best profile.

This investigation may also identify cases where some of the profile parameters have no impact at all or significant impact when adjusted slightly.

The workload characteristics on the Data Domain Restorer (DDR) can be Random or Sequential. To have Data Domain File System (DDFS) support both Sequential and Random workloads and better utilize system resources, we need better intelligence in DDFS in detecting these workloads and enabling the respective optimizations where necessary.

An Access Pattern Detection (APD) algorithm that can identify most use cases that a DDR supports or needs to support will be necessary in order to apply the right optimizations.

Data domain has a network optimization protocol for backup/restore of data called DD Boost. The APD logic can be applied to DD Boost to control read-ahead cache. If the access pattern is detected to be Sequential, Read ahead caching will be enabled/re-enabled. If the access pattern is detected to be Random, Read ahead caching will be disabled.

The advantages of implementing APD in DD Boost are:

  • Intelligent Read ahead caching, based on threshold and historical access pattern detection.
  • DD Boost access pattern is not biased towards sequential reads, and read ahead heuristics are not over-aggressive

Part of this project can be to do machine learning and evaluate data restore patterns for different applications. On the DDR, we provide a database for data warehousing. Records can be added to this database using tags (we call it MDTags). Interesting data points that will help in optimizing read performance should be added to this database and data mining/machine learning should be performed on this database and it should result in 2 outcomes.

  1. Build a data access model for these patterns.
  2. Build application specific machine learning profiles that can be applied during restore.

Example of a machine learning profiles can be in context of READAHEADs, which can be applied to any Storage units/mtrees.

So if an mtree is catering to an NFS client for Oracle workloads, we can apply an “Oracle Optimal restore” profile to that mtree. If a Storage Unit/mtree has a DD Boost client workload for NBU, we can apply an “NBU Optimal restore” profile to that storage unit. Clients can set their own profiles, and we may want to provide a “golden” (default – general purpose) profile. Profiles do not have to be “Restore” specific.

The project being proposed consists of three phases: (1) determine the optimum access pattern profile for sequential restores  (2) determine the optimum access pattern profile for random restores and (3) determine the optimum access pattern for the sequential / random “hops”

Phase One

Using the software supplied by Dell/EMC:

  • Execute several sequential restore operations and modify one or more profile parameters to attempt to improve the restore performance.
  • Record each dataset (parameters and results) in the MDTAG database on the DDVE
  • Determine the optimum restore parameters for the profile and store the results in the database on the DDVE system.
  • Document the method to determine the optimum profile
  • Document the parameters that impacted the performance most positively and negatively
  • Document the optimum profile parameters.
  • Use Python/R machine learning.

Phase Two

Using the software supplied by Dell/EMC:

  • Execute several random restore operations and modify one or more profile parameters to attempt to improve the restore performance.
  • Record each dataset (parameters and results) in the MDTAG database on the DDVE
  • Determine the optimum restore parameters for the profile and store the results in the database on the DDVE system.
  • Document the method to determine the optimum profile
  • Document the parameters that impacted the performance most positively and negatively
  • Document the optimum profile parameters.
  • Use Python/R machine learning.

Phase Three

Using the software supplied by Dell/EMC:

  • Execute several sequential/ random “hop” restore operations and modify one or more profile parameters to attempt to improve the restore performance.
  • Record each dataset (parameters and results) in the MDTAG database on the DDVE
  • Determine the optimum restore parameters for the profile and store the results in the database on the DDVE system.
  • Document the method to determine the optimum profile
  • Document the parameters that impacted the performance most positively and negatively
  • Document the optimum profile parameters.
  • Use Python/R machine learning.

Materials Provided

  • A Data Domain DDVE software package to be used for the duration of the project. This is the Data Domain virtual system that will receive and store backups.
  • A set of binary tools with documented interfaces to be used for the profiles and test results.
  • A test binary (“ddboost_stress”) that acts as a test backup application and that calls the standard Boost APIs. ddboost_stress allows for writing/reading multiple files simultaneously and, thus, allows for generating the necessary load to test and stress the Boost interfaces and can be used for data restore or generation.

Materials Needed from NCSU

  • Hardware and storage to host the DDVE (DDOS Virtual addition)
  • A physical or virtual Linux client to be used for the restore testing using the DDVE over TCP/IP

Benefits to NCSU Students

This project provides an opportunity to attack a real-life problem covering the full engineering spectrum from requirements gathering through research, design and implementation and finally usage and analysis. This project will provide opportunities for creativity and innovation. Dell/EMC will work with the team closely to provide guidance and give customer feedback as necessary to maintain project scope and size. The project will give team members exposure to commercial software development on state-of-the-art industry backup systems.

Benefits to Dell/EMC

The data generated from this engagement will allow Dell/EMC to increase the performance of the DDBoost product set and identify future architecture or design changes to the product offerings.

Company Background

Dell/EMC Corporation is the world's leading developer and provider of information infrastructure technology and solutions. We help organizations of every size around the world keep their most essential digital information protected, secure, and continuously available.  

We help enterprises of all sizes manage their growing volumes of information—from creation to disposal—according to its changing value to the business through big data analysis tools, information lifecycle management (ILM) strategies, and data protection solutions. We combine our best-of-breed platforms, software, and services into high-value, low-risk information infrastructure solutions that help organizations maximize the value of their information assets, improve service levels, lower costs, react quickly to change, achieve compliance with regulations, protect information from loss and unauthorized access, and manage, analyze, and automate more of their overall infrastructure. These solutions integrate networked storage technologies, storage systems, analytics engines, software, and services.

Dell/EMC’s mission is to help organizations of all sizes get the most value from their information and their relationships with our company.

The Research Triangle Park Software Design Center is an Dell/EMC software design center. We develop world-class software that is used in our VNX storage, DataDomain backup, and RSA security products.

Real Estate Data Governance Framework

Who We Are

Foresite.ai collects, analyzes and distributes real estate data from multiple different public sources in a map-based interface. Before analysis, this data is cleaned, normalized, aggregated and displayed to users.  Users can augment this data and analysis with proprietary data uploaded directly from spreadsheets or other data formats (JSON, CSV, etc).

Background and Problem

In the real estate industry, the most money is made when a firm has information that no one else has. At the same time, the process of buying, selling or renting property requires constant communication between the different stakeholders in the process. These two competing priorities lead to three characteristics of our users: 1) they are immensely protective of their proprietary data 2) it is important that they be able to modify/amend specific data points in our provided datasets or their own uploaded data and 3) they need different permissions for different types of users in order to facilitate collaboration on projects.

screenshot of foresite map interface

Our solution is a map-based interface, which displays many different geospatial datasets and lets the user quickly apply filters. It’s primarily run on the client-side, leading to multiple problems: 1) when users upload information and close the app, they must re-upload the data the next time they open the app; 2) if a user finds any errors in the Foresite-provided datasets, they must e-mail us, and we have to manually verify/correct these issues and 3) there is no way for users working on the same project or within the same firm to easily share data in the cloud. Our users are various beta customers who consist of real-estate brokers, developers, private-equity investors, and lenders.

We are exploring smarter, automated systems to streamline this tedious process.  The frontend engine of Foresite.ai uses ReactJS and Javascript ES6, and the backend systems/databases use PostgreSQL and  python3 scripts for handling data. A stripped down version of our codebase will be shared as needed for the completion of the project.

Solution Description

We envision a web app with approximately ~65% frontend and ~35% backend work to be done.  The login pages, React GUI components (to be added to the existing app), as well as all of the authentication backend would need to be created from scratch.  The authentication backend system will have the requirements that it is either based in python3 (or nodeJS) and uses PostgreSQL as its persistent database. This framework should have all the following features:

  • Users will register a password with their e-mail addresses
  • Users will have a profile containing their name, company and contact information
  • Users can create any number of data groups, which contain any number of designated datasets. Each data group has a unique name, and the creator can assign other users with three roles:
    • Administrators can assign permissions and add/remove/modify data and datasets.
    • Users can filter, query and analyze data.
    • Viewers can only view pre-made maps.
  • Data groups should be able to contain users from multiple companies.
  • When a viewer user finds an incorrect attribute, they can click on it and report a correction, Candidate corrections are stored in their own separate database, and users who correct the data should be able to view their corrections regardless of whether or not the corrections are accepted by an admin.
  • Any creator/administrator of a data group may add/modify/remove any data in any dataset contained in the group.  This includes reviewing and applying user corrections

During weekly sponsor meetings, we will review new developments in the app and provide feedback, clarification and training as needed.

Students will be required to sign over IP to Sponsor when team is formed.

 




Fujitsu America

Fujitsu America is one of the top three suppliers of retail systems and services worldwide.  Using Microsoft’s .NET development platform, these systems offer a high performance yet open platform that diverse retailers are able to customize.

MarketPlace Migration and Re-Factoring

Large software projects (more than) occasionally require refactoring.  If the project itself is a Software Development Kit (SDK), the refactoring can affect the solutions dependent on a prior version of the SDK. The work to adopt a new SDK can often require human effort and can be be tedious and error prone. If this process could be at least partially automated, there could be significant programmer productivity improvement, as well as speedier adoption of the new SDK.

Fujitsu is confronted with one such migration caused by the ‘relocation’ of hundreds of classes to different namespaces and changes to method signatures between two versions of their SDK. Such a migration is likely in the future as well, so automation processes are of great interest. While the SDK in question is written in C#, the ‘consumers’ of the SDK are both C# and VB.NET. The complexity of this transformation largely rules out simple, text editor automation because the migration is likely to affect both the consumer source code and the project structures that build that source code.

A key enabler of automation for this project is the Roslyn compiler published as open source by Microsoft® with Visual Studio 2017™. This compiler allows programmatic access to source files in the same navigable expression trees used by the compiler itself. Modification of the source can then be done reliably within that context, avoiding potential scope or name collision problems that a naive, text-based solution might encounter.

Project Organization

In this project, at least the following steps are required:

  • Transformation mapping: Since the namespaces, and possibly class and method names are different between SDK versions, a mechanism must be developed to map the between these naming schemes. A from-to mapping must be created to identify how to transform both code and associated build projects. This may require reflection across the compiled binaries of the SDK, as well as inspection of the source file trees using the Roslyn interface.
  • Modification: Microsoft projects are XML-based and contain references to the specific compiled elements (assemblies) that form the execution dependencies. If the transformation has changed the assembly in which the artifact is compiled, the project needs to be modified appropriately. Similarly, the code element will require modification to adjust namespaces, class names, and methods.

Goals and Previous Work

The fundamental goal of this project is to be able to run an automated transformation of client code and to produce a compilable version of the sample consumers using the new SDK.  Although the team won’t have direct access to client code produced by Fujitsu customers, as a stretch goal, Fujitsu will work with the team to run the transformation against one or more actual customer projects.

In Spring, 2017, an NCSU Senior design team made some progress toward these goals.  This is the starting point for the Spring 2019 project.

Technologies

The senior design team will be working with Visual Studio 2017, both for the project implementation and for the codebase being transformed.

Game2Learn: Gradesnap!

Block-based programming languages, such as Snap! and Scratch, are used in introductory Computer Science (CS) and other STEM courses integrating computational thinking. As more classrooms teach and incorporate CS, there is a greater number of teachers without CS experience who need resources to help teach and assess student performance and competencies.

In order to support these teachers, we are developing a web-based grading platform, classroom dashboard, and teacher-student portal called Gradesnap!, similar to Gradescope. Within Gradesnap!, teachers will be able to do essential classroom tasks, such as:

  • create classrooms and sections
  • create assignments
  • create rubrics to grade embedded Snap! projects
  • enter grades to the gradebook

Gradesnap! will offer a simpler interface to students.  They’ll be able to:

  • submit projects
  • view grades and feedback

Gradesnap! will offer a web interface to students and teachers.  The team can expect to use the following, although there are opportunities to add or replace technologies, based on the team’s experience or expertise.

  • HTML / JavaScript for the frontend
  • Node / SQL for the backend

Game2Learn is advised by Dr. Tiffany Barnes, located on the 5th floor of Venture 2. The primary graduate student working on this project is Alex(andra) Milliken, a 4th year PhD student, who has worked with teachers using Snap! in an Introduction-to-CS class for the past 3 years.

 

Smart Glasses (AR/VR) and Machine Learning

Background

If we were to pick two areas of interest that are making all the right moves in the software innovation spectrum, it would have to be Smart Glasses and Machine Learning. With seamless possibilities and scope for use cases ranging across industries such as Medical, Retail, Service, and Tourism, the time is ripe for development of the art of the possible. Smart Glasses have a unique proposition of being able to consume the field of view electronically while having the capability to overlaying the view with Augmented Reality (AR). Machine learning along with Artificial Intelligence can bring in the unique perspective of contextualizing the data beyond the human mind’s computational power. A combination of these two technologies has the potential to revolutionize the status quo across any process.

Project Organization

The team will first need to establish a development environment for smart glasses, such as Vuzix M300, and they’ll need to evaluate and set up other necessary SDKs to build applications leveraging native features of smart glasses.

The team will develop an application that features Optical Character Recognition (OCR) and Video Stream analytical capabilities for a quality assurance use case.  The application will use Wikitude, an industry-leading AR SDK for object recognition, image recognition and 3D augmentation. It will also use state-of-the-art Amazon Web Services APIs for OCR and Video analysis.  Some aspects of the target application are negotiable, based on the team’s experience and interests. A modular architecture will make the application both scalable and easy to extend or adapt to other use cases.

Goals and Benefits

An end result of this project will have a smart glass development template ready with connectors to cloud service providers such as AWS. These templates can act as accelerators to take on real-world use cases. The use cases can be extended to make them device agnostic and be extensible for any industry. The use cases will be made applicable to Industry 4.0, Smart Campus, Smart Cities kind of initiatives.

The student team will gain a working knowledge of the intricacies behind Smart Glasses, AR/VR and Machine Learning algorithms, along with an understanding of cloud-based Software- and Platform-as-a-Service technologies. The team will understand how to bridge together various seemingly standalone solutions into something more tangible. The team will also get experience with Code Versioning, Quality Control, Best Practices, application packaging, packing and deployment on local and cloud servers.  

Students may be required to sign over IP to Sponsor when team is formed.

Load Optimization

The Laboratory for Analytic Sciences' senior design project is focused on load optimization on cloud computing systems. Here, we use "load" to mean the total demand on a resource from all running analytics. Students will be given:

  • Access to an AWS cloud instance.
  • A set of individual analytics to be run.
  • Approximations of the system resources each analytic requires during runtime (though we prefer these to be measured).
  • The system owner's desired time-varying resource loads (perhaps heavy during the evening and light during the day to permit additional ad-hoc analytics to be run during the workday).

Students will design and implement an algorithm which determines an analytic scheduling strategy that results in resource loads that best fit the system owner's desired resource loads throughout the scheduling period. An initial possible approach to determining a scheduling strategy may be offered, though the students will be encouraged to either design a strategy of their own or to improve upon the provided approach. Testing and evaluation of initial and subsequent strategies will be required to gauge improvement over random scheduling.

Students will get hands-on experience running Apache Pig map-reduce jobs on a cloud computing system and applying mathematical optimization methods. A successful result will hopefully offer a new approach to cloud compute scheduling which produces a more stable and user-friendly cloud. In addition to the above mentioned resources, weekly consulting will be provided. Students may implement their algorithm in a language of their choice, though Python is preferred if convenient.

Helping Hands Training Portal - A Service Project

service project

About The Collaborative

The NC Collaborative for Children, Youth & Families (“The Collaborative”) is a non-profit group of cross-system agencies, families, and youth who educate communities about children’s issues, support the recruitment of family and youth leaders to provide input into policy and training, and organize support to local systems of care groups.

The Collaborative has been working to improve their website and use of social media to promote understanding of children’s issues. They need assistance with developing online training and ways to deliver their message to communities about children’s needs.

Project Description

The Collaborative’s Helping Hands Training Portal is an online system that allows them to create a variety of informative and educational trainings about their organization and other topics of interest. The purpose of the system is to provide a custom venue for creating and distributing training content and assessments pursuant to the Collaborative’s mission and the strategic goal to “increase awareness and understanding of System of Care impact … and provide educational programs to enhance Systems of Care.”

The portal allows the Collaborative to create custom online educational courses (or “modules”) composed of rich text and embedded multimedia. Courses may also contain quizzes to assess the learner’s understanding of the content. Collaborative staff are able to track the learning progress of trainees, and individual users are able to review their own learning history.

The Collaborative would like to add more features to the training portal. They would like some sort of open-ended feedback on how participants use their trainings (perhaps as a forum of sharing of best practices, some sort of chat room, etc.). Eventually, the Collaborative would like to offer a pilot study to see if these additions improve the quality of service in the field.

They would also like for trainees to be able to share their learning progress or otherwise verify training status with potential employers or other third parties, such as to prove achievement of some certification (requirements to be determined!).

The portal started as a service project in the Senior Design Center over the 2017-18 academic year, and has since undergone further development to complete core features.

Technologies

The training portal is built as a Node.js web application using TypeScript. TypeORM with MySQL is used for persistence. The UI is server-side rendered using Nunjucks templating. Interactive UI elements, such as quizzes, are built using Survey.js and custom browser-side JavaScript.

Student Work

Students are encouraged to apply to this project if they have an interest in the technologies used, a desire to explore requirements and implement new features, and a passion for helping others!

Students are asked to release their work and contributions to the project under a compatible open-source license.

Remynda

Project Background

Triangle Strategy Group (TSG) is a technology consulting firm based in Raleigh, NC serving clients in the cosmetics, pharmaceuticals, food and beverage industries. We design Internet of Things (IoT) systems to create exciting new products and experiences for our clients and their customers.

Remynda is a new cosmetics organizer that helps consumers learn and remember when and how to use their skincare products.  Skincare users often find it difficult to adhere to complex skincare regimens, which may require several skincare items at different times of the day and week. As a result, many new users never experience the full benefits of their purchases. Skincare users increasingly receive online coaching and product recommendations from a beauty consultant.

The Remynda organizer uses LEDs to remind a user when to use her products and interacts with her mobile device to provide mobile reminders, tracking reports, online coaching and product replenishment.

In addition to LEDs, the organizer includes a microcontroller, sensors, WiFi and NFC connectivity. The organizer will typically operate under battery power for 3-6 months before requiring charging.

Project Scope

The goal for this project is to develop software for a scalable network of Remynda organizers, interacting with a community of users and consultants.

This will include programming the Remynda hardware, establishing and maintaining connectivity, providing mobile notifications, creating user interfaces for both users and consultants, conducting data analytics and some simple machine learning.

Core Deliverables

  1. Develop microcontroller code for Remynda organizer (some available from prior SDC projects)
    1. Flash LEDs to notify when to use each item
    2. Detect if/when user picks up each item and move notification timing to next regimen phase
    3. Detect when user returns item
    4. Identify item, validate product in proper socket.
    5. Measure quantity remaining of each item and prompt user to reorder when low
    6. Detect user presence and put organizer to sleep when not present / awaken when present
    7. Validate device integrity and provide appropriate alerts
      1. System parameters (MAC address, number of sensors connected Battery%, charging status)
    8. Relay notifications to cloud for forwarding to mobile device
    9. Provide security
      1. Authenticate user with smartphone (NFC scan, QR scan)
      2. Authenticate new items, read unique item ID, confirm genuine / counterfeit
  2. Communications and data management
    1. Establish and maintain Wi-Fi connectivity between each device and cloud (QR code scan, NFC scan, or WPS button press), automatically re-establish after connection loss
    2. Transmit real-time data from each Remynda device to cloud databases including:
      1. Event data (product usage, proximity data)
      2. Product status (present, unique item ID, % full, last time used, expiration date)
      3. Device status (connectivity, product level, battery level(%), Wi-Fi signal strength)
    3. Transmit real time data from each mobile app
      1. Location
      2. Event history (notifications, order history)
    4. Pull environment forecast at user location (sunlight, aero-toxins, humidity)
    5. Engage a machine learning algorithm to adapt reminders and reorder schedule to environment, time zone, user habits
    6. Maintain profile data for Remynda organizers, Regimens, Items, Users, Consultants (contact info, mobile devices, associated users and relationships between devices, regimens, items, consultants, users)
  3. Mobile device front end
    1. Deliver visually appealing reminders and reorder notifications
    2. Allow the user to snooze notifications, provide 2nd and 3rd reminders of needed
    3. Deliver organizer alerts (charge running low, connection problems, any items missing)
    4. Auto-launch user mobile app during regimen application
      1. Pull up environment report / skin report
      2. Play hands-free step-through video as each product starts
      3. Pull up product information (directions, consultant comments, FAQs, benefits, ingredients, reviews, testimonials) as each product is used
    5. Help the user reorder when needed
      1. Launch order through webstore
      2. Alternatively, email consultant to reorder
    6. Allow the user to configure settings
      1. Opt in/out for mobile notifications, 2nd and 3rd reminders, regimen step-through, machine learning
      2. Edit reminder schedule, reorder preferences
  4. Consultant front end (to run on mobile and laptop)
    1. Adherence ranking report for all consultant’s users
    2. Drilldown tracking report for each user, highlight adherence problem areas,
    3. Guide consultant in offering coaching interactions (habits, alternate products / regimens)
    4. Allow consultant to update user’s settings
    5. Allow consultant to write custom regimen,
    6. Make replenishment product order for user
    7. Make new product order for new item to add to Regimen, order Remynda mod-kit

Stretch Goals

For mobile app

  • Capture before and after pictures
  • Voice prompts / voice control of mobile app during regimen
  • Record feedback / comments for consultant
  • Social media integration

Technology

The sponsor is flexible on the mobile app modality (web app / native).  Project documentation will need to emphasize steps for building and extending the application.  The Remynda device will be controlled by Arduino and programmed via the Arduino IDE using C++.  The team will be able to choose an appropriate database technology (with an interest in keeping costs down).

Nondisclosures

Each team member will be asked to sign an NCSU student participation agreement at the start of the project.

Target Audience

The team should consider the following stakeholders:

  • Skincare users age 20-60
  • Skincare consultants
  • Cosmetics marketers

Benefits to Students

  • Work at the leading edge of convergence of digital and physical marketing
  • Build experience with real-world IOT hardware
  • Help establish new standards and protocols for the exciting new field of IOT
  • Receive positive exposure to 2 potential employers.

Benefits to Sponsor

  • Leverage the unique creative capabilities of the full NCSU team
  • Solve a challenging set of real world problems that are critical to our business

Major Challenges

  • Cross-platform app functionality
  • Robust and attractive app design

There is a video demo of prototype hardware at:

https://patrickjcampbell2001.vids.io/videos/d49ddfb51f19e5c15c/181221-remind-a-mp4 

Students will be required to sign over IP to sponsor when team is formed.

Project Archives

2019 Spring
2018 Spring Fall
2017 Spring Fall
2016 Spring Fall
2015 Spring Fall
2014 Spring Fall
2013 Spring Fall
2012 Spring Fall
2011 Spring Fall
2010 Spring Fall
2009 Spring Fall
2008 Spring Fall
2007 Spring Fall Summer
2006 Spring Fall
2005 Spring Fall
2004 Spring Fall Summer
2003 Spring Fall
2002 Spring Fall
2001 Spring Fall