Projects – Spring 2009

Click on a project to read its description.

Redesign of Login Toolbar

Background

AT&T’s Global Technology and Tier 3 (GT3) organization is responsible for developing tools and automation for its Managed Network Services product line. The number of client networks and devices has been progressively growing over the past 10 years. Managing these devices has become increasingly more challenging. The Login Toolbar, that was developed 3 years ago, was a tool created to help associates gain access into management servers and client devices more efficiently. The continued development on this tool has become cumbersome and inefficient. The Login Toolbar is a very effective tool for device access, however, it can benefit from a redesign that is more modular and having a more robust framework. The current code is written in TCL/Python with SQL direct calls and is a Thick-Client running on individual machines.

Project Description

This project will investigate the redesign of the current tool into a more modularized framework that will allow for the addition/removal of components without major code revision and deployment. The tool will need to be user friendly, interactive, and security conscious. It will also need to be easily administrable by the developers.

Login Toolbar Description

The Toolbar is a launching platform that can perform several tasks. The first feature is authorized use of the Toolbar. This ensures the user should be allowed to use the toolbar. After the being authorized, The Toolbar is configured by the user with their login/passwords for both Server Access and End-Device Access such as TACACS. This information is stored locally on a user’s machine encrypted for security. The user also has the ability to choose their method of access such as X-Term/Putty/SecureCRT which ever is available on the local machine. After the User setup is completed, the user know has the ability to access a server by entering it manual or using the Drop-Down menus. The Toolbar will launch the personal access tool of choice and login the user automatically. They also have the ability to connect to an end-user device as well by entering a device. The Toolbar, using their user settings, will launch the access tool; log the user into the appropriate server for the device, then runs a script on the server that logs the user into the device using the pre-defined user access information.

The Toolbar’s code is the intelligence behind the whole process. It use SQL calls to figure out IP addresses for servers and also what servers a particular Device is managed from. This automation allows the users to quickly access different servers/devices by removing the repetitive steps of launching an X-term tool, typing logins for servers/devices over and over again, as well as determining were devices are managed if it is not all ready apparent. There is approximately 5-10 second reduction per incident/issue and when you are looking at 1000’s of incidents/issues a day, this is a large time saver for users and the company as a whole.

Preliminary Project Requirements

Thick-Client (Option 1)

  • Run on Windows XP operating systems (Vista if Possible)
  • Perl/XML/Web Calls (Compiled for security)
  • Must be modular in nature
  • More Detailed Requirements to follow after Demonstration/Discuss.

Web-Based Client (Option 2)

  • Run locally
  • XML/Web Calls
  • Open external tools such as Putty/SecureCRT

Benefit to NC State Students

This project will provide an opportunity for students to look at an existing tool, gather the strengths and weaknesses, and create a new tool that incorporates new processes and languages such as PERL, XML, web calls, HTML. It also will allow them to evaluate the pros and cons of Thick-Client vs. Web-Based and determine which path is more acceptable based on factors such as security, user-friendliness, modifications, and administration.

Customer Collaboration Tool

The students will develop an online tool used for soliciting feedback and capturing dynamic interaction from Duke Energy customers. This interaction could include (but not be limited to) sharing information, experiences, and ways to utilize the product with other customers, as well as thoughts about the programs and pilots. Students should use Web 2.0 technology; for example, students should leverage technology such as blogs, wiki, twitter, yammer, or social networking to allow customers to contact Duke Energy customer service representatives. Creative solutions using such APIs as Facebook or open source implementations (such as Google OpenSocial) may be used, or students may develop their own implementation. Additionally, a web-based dashboard must be created to aggregate customer feedback for high-level analysis. Creative ideas to implement social web-based technologies for corporate use are welcome and expected.

Network Attached Storage Visualizations

Company Background

EMC Corporation is the world's leading developer and provider of information infrastructure technology and solutions. We help organizations of every size around the world keep their most essential digital information protected, secure, and continuously available.

We are among the 10 most valuable IT product companies in the world. We are driven to perform, to partner, to execute. We go about our jobs with a passion for delivering results that exceed our customers' expectations for quality, service, innovation, and interaction. We pride ourselves on doing what's right and on putting our customers' best interests first. We lead change and change to lead. We are devoted to advancing our people, customers, industry, and community. We say what we mean and do what we say. We are EMC, where information lives.

We help enterprises of all sizes manage their growing volumes of information—from creation to disposal—according to its changing value to the business through information lifecycle management (ILM) strategies. We combine our best-of-breed platforms, software, and services into high-value, low-risk information infrastructure solutions that help organizations maximize the value of their information assets, improve service levels, lower costs, react quickly to change, achieve compliance with regulations, protect information from loss and unauthorized access, and manage and automate more of their overall infrastructure. These solutions integrate networked storage technologies, storage systems, software, and services.

EMC's mission is to help organizations of all sizes get the most value from their information and their relationships with our company.

The Research Triangle Park Software Design Center is an EMC software design center. We develop world-class software that is used in our NAS, SAN, and storage management products.

Background for Project

Celerra is the brand name of EMC’s NAS products. These products are designed to provide network based access to files over a number of network protocols (CIFS, NFS, ftp, http). Celerra systems range in size from small systems supporting dozens of users to large, enterprise wide systems supporting thousands of users.

A Celerra system is managed by a management took known as Celerra Manager. It is a web base management tool running on a local PC that communicates with the Celerra system over the network. Celerra Manager can be configured to allow it to manage only a single system or a collection of systems on the network.

Based on recent customer feedback we desire to develop some visually appealing and easy-to-comprehend visual displays of the state of either a single system or a collection of systems. The focus of this project will be to develop prototype visualizations based on data taken from real systems. Our hope is these displays will provide a quick visual summary of the amount of storage used/available, the extent of resource use on the data movers, historical alert trends, and other forms of information useful for diagnosis and planning.

Scope of the project for NC State students

EMC will provide written requirements for the use cases to be addressed by the visualizations to be developed. The student team and EMC will discuss and agree on how to produce the visualizations for review. EMC will work with the student team to provide guidance and technical help as needed.

The student team will be responsible for:

  • designing appropriate visual displays
  • producing a prototype dashboard or other application to display the selected data, using either the XML API or existing CLI commands.

We believe the important components of this project are:

  • understanding and classifying the tasks that Celerra manager users are trying to accomplish
  • survey and select both static and interactive data visualization methods
  • develop visualization methods that are quick to comprehend and useful to the average manager

Examples the student team could consider would be ideas like:

  1. Tree Maps
  2. Small Multiples
  3. Sparklines

We are sure there are other visualizations that could be used. The specific choice will depend on the student’s survey of tools and techniques.

Benefits to NC State students

This project provides an opportunity to attack a real life problem covering the full engineering spectrum from requirements gathering, to research, to design, and finally prototype implementation. This project will provide ample opportunity for creativity and innovation. EMC will work with the team closely to provide guidance and give customer feedback as necessary to maintain project scope and size. The project will give team members an exposure to commercial software development.

Benefits to EMC

As storage usage worldwide continues to grow exponentially providing our customers with tools that allow them to better understand and manage their data is critical. This project is an exploration into leveraging visualization as another tool in our management profile.

EMC where information lives.

Activity Aware Power Off

Abstract:

The biggest single impact we can have on the environment at NetApp is to turn off unneeded equipment, yet most equipment stays on even when its not needed. RTP has nearly 100% remotely controlled outlets, yet they remain unused except to recycle an occasionally stuck machine. This project would determine the best way determine if equipment can be powered down and safely turn it off. It would have to understand if the system is in use or expected to be in use in the near future. It could use a scheduler; but should also be aware of activity such as I/O, CPU etc and possibly abort shutdowns if "real" activity was present. Therefore it would have to understand the difference from routine activity and systems under test. It could turn off equipment that will be needed in the future; but would have to turn it off and get it operationally before its needed again. It would have to safely turn it off and be aware of various operating systems such as Windows, Linux, Unix, and NetApp OnTap and follow the proper procedures for powering down equipment. Optionally it would have good reporting facilities and give reports of how many machines are under its control and how much power it has saved. It should probably be a opt in system to avoid unwanted down time.

The project for the fall 2008 SAS team was to develop an interactive Adobe Flash application to measure the performance of various graphics rendering primitives (lines, rectangles, circles, text, etc.) across the three supported platforms (Windows XP/Vista, Mac OSX, Linux). The spring semester project will take the results of that project and extend it in three primary ways:

  1. Measure the impact of adding/removing/changing primitive rendering options like line width, end caps, and anti-aliasing
  2. Remove all of the Java code for data passing and remote method invocation to focus solely on graphics rendering
  3. Add three comprehensive tests for combined graphics rendering performance in these common business components:
    1. 2D scatter plot with labels
    2. Multi-line plot
    3. Node-link diagram

We will work closely with the NC State team to provide all of the "scene graphs" to reproduce the business components above, so no prior business graphics knowledge is required. We will also work with the team to learn and understand the relative trade-offs between performance and rendering quality. The outcome of this project will be a working interactive application that is 100% Flash-based which will allow the user to turn on/off rendering options and primitive types and re-run performance tests. The system should be well-documented and extendable by SAS or any other development team going forward. No prior Flash or ActionScript knowledge is required, as this new and rapidly growing technology very closely follows the principles learned in Java and/or C#.

Last semester's abstract, for reference:

Measuring, Analyzing, and Comparing High-volume Data Transfer and Graphical Rendering in Adobe Flash

This project involves research into, and development of, a system for in-depth analysis of various graphical modes and data transfer protocols for current production and upcoming beta versions of the Adobe Flash player. For example, the Adobe Flash Player version 9 had three different modes for delivering graphics to the screen - normal, transparent, and opaque. The beta version of the Flash 10 player has 2 additional methods - direct and via an on-board Graphical Processing Unit (GPU). The system should identify several pros and cons for each of the rendering techniques, including a comparison of the Adobe players across the supported platforms (Windows, Mac OSX, and Linux). In addition to the analysis of rendering performance, this project will also involve gathering performance numbers on various protocols for data transfer in a distributed computing environment. Beyond just research and resulting recommendations, the end product of this project should be include an interactive Adobe Flash application which allows subsequent test runs to be completed and measured. The user of this application should have control over submitting various sizes of data and comparing the performance between the various data transfer protocols and graphical rendering modes between various versions of the Adobe Flash player.

GlobalSTORE Help Menu System, Continued

The team will enhance the new help menu system developed by the Fujitsu Senior Design Team Fall 2008 for the GlobalSTORE application.

Enhancements to the new help system include:

  • Design/implement a tool for creation of help content
  • Support for internal hyperlinks to other pages in the help system
  • Support for external hyperlinks to launch either video or web pages
  • Support for the displaying help in the operator language

Requirements

The Help Content Tool UI must be developed using Windows Presentation Framework. Output must be in a SQL format so that it can be stored in a Source Code Control system. The SQL is executed at install time to load the database.

Help Content Tool must support roundtrip updates of the help content.

All code must be written in VB.NET using Microsoft Visual Studio 2008, Blend and the .NET 3.5 framework.

Examples of the products and technologies that will potentially be used in this project include:

  • Microsoft Windows XP (Operating System)
  • Microsoft VB.Net(Development Language/Environment)
  • Microsoft Developer Network(MSDN) Library
  • Microsoft SQL Server 2005 (Database)
  • Microsoft Visual Studio 2008 (IDE)
  • Microsoft Word (Word processor - documentation)
  • Microsoft PowerPoint (Presentations)
  • XML

The goal is not for the students to become experts in any of these technologies but rather to understand that such classes of products or technologies exist and in which stages of development their use is appropriate.

Last Semester Abstract (for reference):

The team will create a new help menu system for the GlobalSTORE application. The current help menu system for GlobalSTORE is falls short due to both usability and technology demands. It fails usability because the help menu displays in a window that covers up the actual application causing the operator to be unable to see the object he or she is getting help on. Also, future versions of GlobalSTORE will no longer require IIS to be installed due to tightening restrictions of the Payment Card Industry. Since the current help menu system requires IIS, an update must occur.

The help menu system consists of multiple components. A UI component will display content to an operator without obstructing the view of the application. A storage component will provide a system for customers to easily store and update help content. A possible transport layer may need to be created in order to move data from the storage system to the UI if a currently implemented mechanism can not be used. Additionally, a tool for maintaining and updating the help content should be created.

Real-Time Wolfline Passenger Data

A few years ago a band of comp-sci kids graduated from NCSU and struck out on their own. The result was a way for students on campus to track the Wolfline buses in real-time on the web and on mobile phones. That technology is now being used at over half a dozen schools.

Now it's your chance to develop this technology further!

Ever wait at a stop frustrated as you watched a packed bus pass you by? Wouldn't it have been great if you knew ahead of time that that bus was going to be full? Or even better, if a second bus was automatically dispatched to pick you up? With your help this can become a reality. Each bus is equipped with automatic passenger counters to count how many people are on the bus. Unfortunately, the data these counters collect are only processed nightly, not much help for stranded riders. TransLoc would like to make these data available in real-time to the Wolfline and riders like you.

Impact students every day by leveraging the bus capacity data in real-time. This project consists of parsing the live raw data from the passenger counters (in CSV) and building a web-based interface that

  • Shows the current passenger capacity on each bus with a running tally of the day's boardings and alightings
  • Alerts the viewer when a bus reaches 90% capacity
  • Links boardings and alightings to the stops at which they took place
  • Tallies the boardings and alightings in a database for easy processing
  • Allows for a drill down of each boarding and alighting
  • Provides the passenger counts over a given time frame in concise and easy to understand presentations (including tabular, graph and Google Map views)
  • Lists when buses were at or near full capacity over a given time frame, as well as when they were almost empty
  • Using historical data, estimates when buses may reach full capacity or will run near empty

Technologies for the back end are flexible, but PHP, Python and MySql under Linux are preferred. XHTML, JavaScript and Actionscript are preferred for the front end interface.

HEURISTIC EPISODE CONSTRUCTION KIT (HECK)

Problem Statement

Organizing healthcare claims is an important example of the application of database technology in the health care industry. Episodes of Care for an individual are created by collecting claims for doctor visits, hospital stays, laboratory work, etc., over a certain period of time. These episodes are updated on a regular basis (monthly) and a data repository is established that can be used to analyze healthcare costs, quality of healthcare, initiate fraud investigations and more. Collecting and organizing these Episodes of Care is compounded by the size of the operation. Data needs to be stored for tens of millions of beneficiaries. Claims on behalf of these beneficiaries reach into the hundreds of millions. Over the period of interest, the total number of claims to be processed easily approaches several billions.

The problem is that it is taking too much time to rebuild the database. Here is one example: Using two powerful Sun v890 build servers with 8 dual-core CPUs and 64 GB of RAM each, the current time for re-constructing episodes requires more than 56 hours of uninterrupted computing for each server. This rebuild time is burdensome, and it will only grow as the number of beneficiaries and claims grows. Can this rebuild time be reduced? A hint at a solution lies in the fact that empirical measurements have shown that in the rebuild process, commonly only 15% of existing episodes are affected by new claims. The other 85% are needlessly re-computed.

During the Fall 2008 semester, an NCSU Senior Team created a software prototype with this information in mind. The current prototype uses MEGEE, the Medical Episodes Grouper (Enterprise Edition) that is part of the Thomson Reuters Medstat Advantage Suite product. MEGEE uses administrative healthcare claims data stored in a data warehouse, and builds Episodes of Care. The challenge for this semester is to evaluate and extend this prototype.

Specific tasks for putting together the Heuristic Episode Construction Kit (HECK) include: (a) Test correctness and performance of the existing prototype; (b) extend the prototype by implementing at least one heuristic that reduces the amount of data being processed while preserving correctness of the resulting episodes; (c) compare the performance of the implemented heuristic(s) to the original approach.

Student Insights Gained

Students will learn the following: working in a product development environment, working on a data warehouse related product, interacting with subject matter experts, devising and establishing the validity of the chosen heuristic approach, extending the prototype using existing software tools, testing the correctness and performance of the prototype, presenting findings to company representatives.

Student Skills and Experience

Required: Database programming, basic knowledge of SQL-92, basic knowledge of Unix commands, ability to work with subject matter experts.

Optional: Unix scripting language such as Python, Tcl, PERL, or one of the shell script languages.

Development and Test Environment

Students will have access to a Unix server running the required modules of Advantage Suite, and a database server running the Teradata RDBMS. All data has been de-identified to satisfy HIPAA regulations.

Title: Implementation and Performance study of iSCSI Extensions for Remote Direct Memory Access (iSER) using available open source initiators/targets.

With increased focus on 10G Ethernet, vendors have begun to develop specialized Remote Direct Memory Access hardware to offload processing and reduce the copy overhead in TCP/IP network stack. We know that iSER assisted iSCSI implementation does have clear performance advantage for specific workloads but we would like to have concrete independent performance numbers. This project would involve reading research papers on iSER, comparing and modifying existing open source Linux code and getting performance numbers for different workloads.

Project Breakdown:

  1. Understanding iSCSI, iSER, iWARP's and advantages of iSER. There are quite a few research papers on iSER but they are long on research and short on providing information that could be useful for commercial solutions in the market.
  2. Analysis of different open source initiator/targets: Look at different iSCSI & iSER implementations such as the UNH Reference Target and initiator, OFA RDMA stack and other Linux iSER implementation. Most of these open source implementations need some rework. Get one of these implementations to support iSER with 10G Chelsio RNIC's.
  3. Create workloads using sio or other tools that mimic high CPU intensive workloads such as Decision Support Systems, data mining, Server Consolidations (Hyper-V) and similar apps that would benefit from iSER/RDMA by way of Lower Server CPU usage and utilizing the full capacity of 10Gig.
  4. Get Performance numbers especially server side throughput/latencies for these different workloads with traditional iSCSI (Linux) initiators and compare the same with iSER assisted iSCSI. Use the workload characteristics to suggest the type of applications that would benefit with iSER and provide tuning guidelines.
  5. Work to define the business benefits of iSER with the NetApp team
    1. Relationships
    2. Technology leadership

Thermal Efficiency Visualization

Background

Today's compute and storage environments (data centers) utilize cooling designs that have changed very little since the initial Computer Room Air-Conditioning (CRAC) designs of the 1980's. While IT equipment in general has undergone at least 3 phases of fundamentally different physical design (mainframe, volume, blade), the supporting facilities have not. The majority of CRAC systems deployed today cannot meet the density requirements of blade servers (as an example) and are not prepared to support the dynamic electrical loads that mirror dynamic compute loads resulting from virtual machine mobility.

Problem Statement

  1. Density & Capacity - In the traditional CRAC design there is limited flexibility in how to remove heat as per-rack densities increase, without massive over-provisioning.
  2. Dynamic Electrical Loads & Virtualization - Most CRAC systems are not prepared to accommodate electrical load variances that virtualized compute and storage features might deliver.
  3. Coordinating Thermal Supply and Demand – First, traditional CRAC systems are responsible for as much as a 20% electrical inefficiency due to 'demand fighting' and general lack of correlation with IT infrastructure. Second, a lack of Feedback & Coordination between traffic demand and cooling systems is needed so that predictive modeling can be used.

Solution Development Overview

Similar to what The Green Grid is proposing for efficient cooling of data centers, Cisco is interested in developing an algorithm that can optimize the air handling between the CRAC and the main power consuming IT infrastructure segments (server, storage, network, appliance). The first step towards achieving this goal will be to monitor and visualize the cooling effects of the air within each rack in the data center.

Sensors will be needed to monitor the temperature at various points in several racks in a data center. These sensors will feed their information back to a central device that will collate the information and allow it to be visualized. SNMP will be used to read the environmentals of each device in the rack (internal temperature, fan status, etc.), along with their processing load. A model will be constructed to show how the temperature varies with the load in each rack. Once this information is obtained we would like a dynamic model representation of existing CRAC systems based on various related variables. Additionally, a GUI will need to be developed so that computer room operators can see the effects of any changes in the racks. A paper on the analysis and areas for improvement, changes or recommendations will also need to be delivered.

Stretch Goals(may include):

  • Predict the airflow and temperature set points needed for time-of-day processing.
  • Determine if the temperature output of the CRAC system could be raised if the air flow is optimized throughout the data center.
  • Potential for additional goals as agreed between students and Cisco

11. Computer Science (402)description coming soon