The GPN2010 Annual Meeting kicked off at 6PM  with tonight’s reception sponsored by IBM.  Every ta ble was full by 7PM and most stayed until the bar closed and the food was removed (8PM).  A few more attendees walked in after the reception ended to share the good feelings and camaraderie that are typical of our meetings.

This was really a feast for the mind and the body.  There was intense and convivial discussion at each table.  Kate Adams outdid herself with her selection of great tasting hors d’oevres–I especially enjoyed the fruit and cheeses.  I noticed that others really went for the chicken and beef satay.

Elaine Ostrander is unable to be here on Friday morning for her keynote address, but Rob Vietzke (Internet2) and Jennifer Schopf (NSF), our other keynoters, were at the reception in spite of plane delays.  When the program committee learned that we might need a substitute speaker, several suggested by email that Frank Lee of IBM would be a great substitute.  Frank was key for making IBM’s sponsorship of the reception possible!  Unfortunately, he had to leave right after the reception.

Stan Ahalt:

  • Let’s have a BoF at SC
  • Let’s try to get info on what’s out there

Reps from Dell, HP, Intel, and MathWorks

Stan:  Thanks to Dell for the dinner last evening.

Question 1:  What do you guys consider an optimal relationship with universities?

Dell:  Economics mean that 30% mark-downs are not going to happen.  But selling isn’t collaboration and the best example of collaboration would be the viz cluster at TAC.   It was proposed off-the-cuff and turned out to be win-win for both of us.  We both learned something and they wrote a white paper.  Approaching one another as peer to peer is important.

Intel:  For us the nature of collaboration is different.  We are interested in basic research.  We also have a special relationship with those schools where we recruit a lot.   Finding common ground is essential.  We want to work as people who can help the scientists.

MathWorks:  It’s a two-way street.  We’re not very proactive about initiating these, but we look at whether we will each get something out of this.  Is this a way we want to move but don’t have the resources right now to put into this.   Those coming to us are usually on the leading edge.

HP:  We have a lot of basic research going on.  There is a lot of work around cloud computing and what does that mean.  There is a project between HP and Yahoo to build an open source cloud software stack.  Another hot area is power and cooling.   We’re working with Georgia Tech on another project.

Question:  How disruptive is it likely to be to have GPGPU based machines?  Does this keep you up at night?

HP:  It’s definitely disruptive.  It will fill a niche.

Intel:  The software is the big concern.  The big question is whether this is really the time that a big change like this will occur.  Will the whole community move to this?  We are literally a few weeks away from finalizing where we will be with this.  The big unknown is whether we go onto main street that it will be competitive.

Dell:  My twins keep me up at night.  The only way that we can work together is for all of us to raise the level of discourse.  I think gpgpu will be part of tht but we should work together on some of these bigger issues first.

Q:  What will this be like when we no longer have equipment moving between sites?  Do we still build a data center on our campus.  If I’m purchasing a device from Dell, it doesn’t have to be housed by me.

Dell:  We’ll solve the logistical problem when it becomes a business issue–when there’s demand.

Q:  Do you see a time when we can just purchase a data center and plug in the power.

HP:  We are already doing that.  You buy data storage in a box and that box is a 45 ft container that you plug in.  There are new financial terms on it as well, depending on where you put it, for instance.

Q:  We have servers all over.  Would you guys consider funding a survey to find servers on my campus that are running all the time, but being used only 10% of the time.  The pay-off for that is that if we can save money on power, my administration might divert that money to new equipment.

HP:  I was paying close attention to that discussion, yesterday.

Dell:  Let’s talk.


What’s Happening at the U of Michigan around IT Strategy

Dan is Associate VP for Research Cyberinfrastructure

  • Principle 1:  Borromean Ring to show synergy:  If you remove one ring they fall apart.  Three rings are Provisioning, Transformation and System Innovation.
  • P2:  Never doubt that we can change the world.
  • P3:  Innovate on the edge–create your own skunk works.
  • P4:  Invest.
  • P5:  Lee Iacocca told me to invest in people and let them go.
  • P6:  MBA 101: Align Mission/Goals, Authority and …
  • P7:  Conceptual shift to High Performance COLLABORATION.

We are in the course of a revolution–escience is alive and well all over the world.  CI enabled humanities is on the way.  CI enabled learning is coming soon–lifelong and life wide.  The mission of the university is Discovery, learning and societal engagement.

There is a balancing act between sharing tools and domain specific tools.  We need to also pay attention to the NEW stuff.

At Michigan we are talking about whether we can create a university shared CI that will be comprehensive and be cost-effective, greener, etc.

This is a top-down approach, from the state (NextGen Michigan is a new organization) to the campus.  There are domain stewards who are responsible for aspects of the university IT mission.  They serve on a university IT council.

They are working towards a more effective, more efficient and greener IT/CI organization that serves the university.

My stewardship role to to nurture the relationship between research and CI.

I’ve created Computation and Information Resources for Research as a Utility Service (CIRRUS) project.  this is related to the NextGen project.  This project has a huge number of parts (Dan apparently likes details.)

[It occurs to me that the MAC has become the dominant computer at these sorts of meetings.  There are more and more adopters of the MAC.  At this point, being for the underdog in our community means adopting the dominant technology!]

NetxtGen also has a lot of parts but needs them to really provide a roadmap for the whole state.

Bringing up our own machines in three phases going from a 128 core system to a 10 core system.

Summary:  We are evolving from unit-level to campus-level provisioning, pay as you go model, the datacenter has very different policies surrounding its use than other data center space that academic units are familiar with.

  • Efficies will increase.
  • All of this represents a change over the status quo
  • There will be discussions about pacing size total costs and how costs are allocated.

Questions:  How are you going to adjust with different privacy and security requirements in regular academic computing?

Comment:  If the medical center can run secure systems, then we can as well.

Question:  How will you charge?

Comment:  It may be paid by the user or by the college.

Q:  What about students?

C:  Someone is going to pay.  There will be flux.

Q:  Why invest more money in Michigan’s infrastructure when you are so successful in gaining research $$$ already?

C:  Well, it’s more about the faculty who may be limiting what they can do because of the size of the resources.  We have to stay at the leading edge.  There’s new science.  Part of it is taking the long view.

Stan:  It’s quite a gift to see how one of the leading universities is about to do something.

The Need for Collaboration and Advocacy

  • The best thing that ever happened to CI in Oklahoma was the state EPSCoR need for a state wide CI plan.
  • We put together an MOU to provide cycles to everyone in the state for free (with the exception of for-profits who have to pay a fee).
  • The incremental cost for providing these resources is about 1%.  The political good will is enormous.
  • Found a list of all colleges and universities, including beauty colleges, in the US
  • Henry layed out all the groups he has been working with and the SC in Plain English workshop.

Question:  Do you have any data on how well this is all working?

Response:  I have individual anecdotes.

This will be thoughts on sustainability, starting with history.

S1: NSF HPC Program Evolution:

  • There was a gap in supercomputing.  John Connaly and Jim and Dan were all involved in setting up the Office of Advanced Scientific Computing at NSF.  The SC centers were devised then on the back of a napkin.
  • In 1995 the Hayes report identified mission creep at the centers.  PACI came online in 1997 and the centers were phased out.   Terascale program in 2001 came online.  The Office of Cyberinfrastructure (OCI) was started in 2005 and has started a number of task forces.

S2: Center Evolution:

  • 1980 there were a small number university centers.  In 1983 NSF bought access to supercomputers that existed at Colorado State, Minnesota and Purdue (Phase 1).  Ken Wilson began writing “A Third Kind of Science” during one of these meetings and he invented the term “Grand Challenge”.  In 1985 NSF established the five centers (Phase 2).  CASC was formed in 1989 and there was concern that NSF might only stand behind their phase I and II centers.

S3:  Evolution of UIUC Center:

  • Grew from 2 user communities to many more from 1985 to 1990.  Staffing went from 75 to 175 and then over 400 in the PACI program including partners.
  • Funding for the centers dropped off from 90% in the early stages.
  • MOSAIC as one example of mission creep!  Folks said that this is not science.

S4:  HPC Centers 2010

  • Nation - 11 Casc Members
  • State - 4 Casc Members
  • Campus - ? most Casc members

S:  Campus Networks

  1. Still a battle to get connectivity to universities.
  2. In South Carolina there are faculty who have less bandwidth than a K12 student in North Carolina or Florida.
  3. Faculty drive to another university get better connectivity.

S:  Jim’s Maslow slide

Top of Pyramid

  1. Productivity and Discovery enabled
  2. Applications, Tools, Consulting
  3. Software Stack/Middleware
  4. Servers, CPU Storage
  5. Network Connectivity, Power, Cooling

Bottom of Pyramid

S:  Clemson CI Days

  • We treated CI Days like a Billy Graham crusade
  • We got national support, including people from NSF
  • A faculty member from Purdue was our keynote speaker
  • We hussled vendors a little bit

S:  Business Model

  • Diversification:  Identify your investors, co-investors, customers, entrepreneurial activities
  • Diversification is important because of the ups and downs in individual revenue streams

S:  Diversification at Clemson

  • Clemson runs the state medicaid system
  • Clemson uses Condor and has had very good success
  • Clemson is morphing into a cyberinstitute, which helps diversification
  • Working with industrial partners
  • Technology transfer:  Staff are treated like faculty and receive royalties on things they help develop

S:  How to Communicate Results

  • NSF GPRA has a very nice way of presenting results–you can find it online
  • We have 300 students using our Condor farm

S:  Being Nimble

  • Bad money:  Fat dollars are the best (no strings attached/unrestricted funds)
  • Reinvention and adaptation can be painful
  • Want good morale:  Highest when vision is clear, people understand it and their work can be linked to that vision

S:  Summary

  • Campus CIOs are critical to this mission.  Make linkages.  Get this research into the mission statement and get it treated as such.
  • The recession changed things and collaborations are much more valuable
  • The governor in Massachusetts is behind a plan to build an HPC Center & Research Program in Holyoke
  • Challenge for this community to stay in front rather than behind!

Q&A

  • Discussion about formation for early regionals like MIDnet.
  • We are going to need people at the universities who are experts.
  • If Arden Bement steps down then the National Science Board will need input on appointing a new director of NSF.

Comment by Stan Ahalt:

  • The large centers have stayed the course because they have been sustained in one way or another.  That’s why Jim’s message is so important.  It’s going to require us to be entrepeneurial.
  • We need to continually remind ourselves that we are in the business of helping science.

Schedule for today is

  1. Federal funding opportunities and strategies for Tier 1 and Tier 2 RCCs.
  2. Open discussion on the need for collaboration and advocacy.
  3. Panel discussion on industry and vendor relationships.
  4. Wrap-up

This is all available via WebEx.

Late additional speakers:

  • Dan Atkins on his work in Michigan
  • Final Slides from last breakout

Note that virtually everything is now posted at the wiki:

https://mw1.osc.edu/srcc/index.php

Lifka:  All findings from the breakout sessions will be published on the Wiki with some time for comments.  Please comment on them and then we turn this around into a report.

4:45 PM Adjourn

4:30 PM Center Funding + Vendor Relations

S1 Value Proposition of Center:

  • Include current users as well as potential users who need us
  • HPC Center enables you to be competitive
  • Look at A&S, Med school
  • For researchers:  reliable, professional staff, support, can help port and tune codes, quick turnaround
  • For administrators:  efficiency, saving resources, include those who cannot afford to own resources of their own

[Comments]  Can you actually save money by consolidating?   There are things you can do by consolidation that you couldn’t do otherwise.

[My observation]  This is probably an inverse u-shaped function where increased consolidation improves things to a point, but is less efficient for very small and very large outfits.

[Jennifer] Contact me if your are interested in doing a study on this.

  • More on value proposition…
  • Makes unfunded grad student research possible
  • site licensing of software
  • assistance with proposal development
  • assist in bringing faculty together with industrial researchers

S2:  Sustainability + Others

  • Can’t expect to make this part of indirect cost recovery
  • this needs to be considered part of ongoing research
  • research IT is just a part of IT services and the discussion should be happening at that level
  • Expertise is a fixed cost but hardware is an issue
  • Bring value by training grad and undergrad students who can assist researchers
  • condo models bring the community together and foster collaboration
  • condo models help faculty focus on the science
  • need to focus on research computing as a multi-stakeholder investment
  • Have a faculty advisory board
  • More tomorrow

4 PM Eastern:  Organizational Models, Staffing & Succession Planning

Presented by Henry Neeman

  • Slide 1:  Summary:  Everyone’s different, that’s Okay, or maybe not!
  • Slide 2:  Serve single or multiple institutions?  Perceived need by users for local resources.  Lots of people want help with their computing on their laptop, fewer want HPC, and fewer still want advanced HPC.

Dan Atkins:  So, do you provide support for laptop hpc?

Henry:  It depends.  I just need help with the laptop may turn into more resources needed.

Dan:  Do HPC Centers have a responsibility to help users migrate to other resources?

Comment:  Yes.  But you have to start small.

  • Slide 3:  When HPC Centers Serve Multiple Institutions, How did that happen?  For many of us that’s how we started (government fiat) and for others we expanded as we went along.  Sometimes they begin formally with a state mandate.
  • Slide 4:  Who do we report to? CIO 7, VPR 6, Provost 5, other 1.
  • Slide 5:  Where does money come from?  Primary:  CIO 6, Provost 2, VPR 6, Other 3.  Secondary:  CIO 5, Provost 2, VPR 3, Other 2.  Not everyone has a secondary source.
  • S4:  Compared to libraries.  Libraries are old and computing is new.  Not funding computation has a much lower backlash than not funding libraries.

[Comments] Librarians might have a different POV.  Library was the core of the university at one time.  I’d rather be considered the organization of the future.  With a book, it’s a permanent purchase that lasts forever.

  • S5:  Oversight Boards.  Avisory board actively engaged 9, user boards, 10, admin board 7, funding board 6, tech board 2, strategy board 11, none functioning 4, has non-stem members 7, Moard meetings are formalized and public 2 with one more expected.
  • S6:  Tenure track faculty involvement:  Center directors also tenured/tenure track 8, formal involvement of tenure track faculty 7, centers that say they have a say in tenure decisions 2.
  • S7:  Kinds of activities:  cycles 18, data/storage 16,  and more
  • S8: Overhead: Include us in F&A
  • S9:  Greatest challenges:  administration doesn’t understand us, hard to find folks who know how to do this, got to skate where the puck will be, not where it is.

3:40 PM Eastern:  Metrics & ROI Panel First

See Previous Post with details of the discussion for the actual metrics discussed.

Stan Ahalt:  Commenting on participation in grant proposals as a metric for the HPC Center:  At OSC we always captured Empower, Partner or Lead on participation in grant proposals.  Empower means that we were mentioned, partner means that we did substantial writing, lead meant that the money came in our door.

Starting questions on handout from Vijay Argawala who is leading the breakout:

  • What metric of success do you use?
  • How would you characterize the level and type of investment and, then, measure its effectiveness and what it returns.

Group Discussion

  • Comment:  What is missing–how success can be measured in terms of contributions to interdisciplinary research and education.
  • Vijay:  Are we the glue for multi-disciplinary research?  Do we play a distinct role, here?
  • Comment:  A measure of success depends on the audience.   Administrators may be looking for one thing and faculty for another.
  • John Connaly:  Adminsitrators pay attention to the people who bring in the big bucks.  And you can look inside those awards and see that people are being employed–that’s the economic argument to the legislature.
  • Vijay:  Should we have qualitative as well as quantitative evidence?
  • Comments:  Surveys, publications that reference the center, citations that reference those publications, etc.
  • Jennifer Schopf:  It would be nice to know how many cycles are used by research area across the US.  We also need to know about data transfer.
  • Vijay:  What should centers like ours measure? If you want to take your campus network from 1 to 10 grid then you want to know about data movement across campus.  If you want to increase your connectivity beyond campus, then you want to know how much research-related data is being transferred beyond.
  • Jennifer:  Several panels and the amount of data transfer was not clear.  For ARI awards there was a requirement to know what science could be done that wasn’t done before.
  • Jill:  Hard to measure the kinds of things being asked for.
  • [several conversations at once]
  • Vijay:  The report should reflect that science drivers should be indicated.
  • [Multiple comments about NSF and ARI awards.  Nothing substantial.]
  • Jill:  Measure academic disciplines using computational resources and what % of those who can use it actually are using it.
  • Comment:  How do you find the potential number?  If you have one art historian using your resources do you count all of them as potential users?
  • [My comment:  Is the purpose of these measurements to increase use or to justify use?]
  • Comment:  Why not just measure the change in number of users over time (numerator)?
  • John:  We use the argument that by counting all the CI awards to our university, count up the indirect costs and use that to obtain funds for our center.
  • Comment:  We keep a database of all of that information.
  • Jennifer:  One thing in the papers that came up was the amount of money saved by centralizing.  Are there any measures of that?
  • Comments:  There are some reports that the numbers get better as you centralize.  You have to include the costs of power and cooling.  Staff costs are significant.
  • Comment:  I would measure something about energy costs or carbon footprint.  That’s the big number that no one is looking at.
  • Vijay:  It would be worth putting a number on how much is saved by centralizing.
  • Comments:  Is there a way of distinguishing what makes sense centrally and what makes sense being distributed to the user’s location?  Can we find out definitively which makes more sense?  It depends on where you are and what resources your campus has.  It doesn’t make any difference because even if you intend to keep it your department you must create a computing center:  power, cooling, staff, fixing it when it breaks.  Most groups at our institution choose to give a cluster to us (centrally).
  • Vijay:  It is hard to quantify the benefit of centralization.  We should try and find this out in the future.
  • Comment:  The number of people who use computation is enormous.  Most use their PCs.  Those who find the PC too slow move over to central resources.
  • Jill:  Maybe we shouldn’t be measuring the past.
  • Comment:  Maybe we should be measuring new users.  Also, we should measure the number of faculty with whom you consult.
  • Comments:  What about counting the amount of staff salary?  What about counting research IT budget as percent of total university budget?
  • Vijay:  Now let’s turn to Return on Investment.  We should measure utilization with some degree of precision.
  • Comment:  When the provost looks at the power bill he wants to know how that money is being spent.
  • Vijay:  You really want to measure increased productivity, but let’s set that as an aspirational goal.  Try to capture the number of stories heard anecdotally and record them.
  • Jennifer:  One of the things that EPSCoR is trying to measure is the ROI from community involvement (e.g., in CASC)?
  • Vijay:  Each of us agree to quantify the benefit of community involvement.
  • Greg:  Perhaps help the group at NSF that creates the research capacity survey to ask some of these questions.
  • Vijay:  Perhaps measure the amount of collaboration with industry in a 20 mile radius.  Quantify the role we have in sparking innovation and as serving as a catalyst with private industry.  Also, the number of start-ups.
  • Jennifer:  Then, there is outreach to minorities and smaller universities.
  • John:  We encourage projects with HBCUs.
  • Comment:  More simply, what’s the diversity of our centers?
  • Vijay:  I don’t want a measure where we deliberately make our centers look bad.
  • Comment:  It’s a useful measure if you’re trying to improve the workforce.
  • Vijay:  Then there is agreement that we want to measure it.
  • Jennifer:  Is there anything about retention rate–how many users remain users over a long period.
  • Vijay:  So we are talking about the number of users who come once or twice vs. those who flood us with comments.
  • Comment:  Maybe the first users did their work and then didn’t need the resource any more.
  • Comment:  You get that with classes that use the resource for a short time and then the class ends.  You can’t count them as failures.

At this point, the notes taken by Art Vandenburg were to be transcribed for the larger session.

Goal of their Center is to have a broad constituency — users in a variety of disciplines and colleges.

Lifka:  People are now comparing cost of our services to Amazon.

They are flop shops.  We have to not be a flop shop.

Allocations to computational resources are made, not on the basis of whether or not you are funded, but on the basis of whether you wish to use the computational resources.  The goal is to expand the user base.  On the other hand, unfunded research may not be able to use as large a slice of resources as a funded user.

User community is roughly 2000 individuals.  Vijay showed a graph representing ratio of target to actual usage over time for his user community:  Above 1 and you are at or above your allocation.  Below 1 and you not using all you have “paid” for.  The resources can be overbooked since many users are below 1.

Target wait time is about 6 hours.  Argument for trade-off for instant access to local resources:  In exchange for giving up your own local resources, you gain expertise, software and more.

Peppin:  There are different models to serve different cultures.  I’ve built condo models where we house resources for research groups.

There were a list of Top Ten Recommendations (more detail on the slides).

  1. appeal to a broad consitituency
  2. flexibility in system configuration and adaptability
  3. keep barriers to faculty participation low
  4. maximize system utilization–above 90%
  5. Extensive software stack and rapid turnaround in installation of new software
  6. provide consulting with subject matter expertise
  7. strong commitment to training:  classes, seminars, workshops
  8. accurate and daily system utilization data
  9. build strong partnerships with hardware and software vendors
  10. make emerging technologies and test beds available to faculty and students

If we don’t invest in computing then we will see what happened with manufacturing — it will move away.

Cost Model for university-based computing center:

$10M for a Green Center

Annual Budget

  1. Hardware  2.5 M replacing 25% of installed compute each year
  2. Software  .025 M annual licensing costs
  3. remainder is in the slide at the web site.