Friday, April 06, 2012

#ALGORITHMS: "U.S. Government to Attack Big Data"

The challenge of analyzing all the Big Data streaming in from sensor networks, biological studies and scientific-, industrial- and consumer-devices worldwide requires a concerted effort to improve machine learning, visualization and the whole toolbox of analytics, according to the U.S. government which announced $10-, $2- and $1.4-million awards today--one of many more to come.

EarthCube received an NSF award to develop a community-guided cyberinfrastructure to integrate data into a framework that will expedite the delivery of geoscience knowledge.

National Science Foundation (NSF) Director Subra Suresh recently outlined efforts to build on NSF's legacy in supporting the fundamental science and underlying infrastructure enabling the big data revolution. At an event led by the White House Office of Science and Technology Policy in Washington, D.C., Suresh joined other federal science agency leaders to discuss cross-agency big data plans and announce new areas of research funding across disciplines in this field.

NSF announced new awards under its Cyberinfrastructure for the 21st Century framework and Expeditions in Computing programs, as well as awards that expand statistical approaches to address big data. The agency is also seeking proposals under a Big Data solicitation, in collaboration with the National Institutes of Health (NIH), and anticipates opportunities for cross-disciplinary efforts under its Integrative Graduate Education and Research Traineeship program and an Ideas Lab for researchers in using large datasets to enhance the effectiveness of teaching and learning.

NSF-funded research in these key areas will develop new methods to derive knowledge from data, and to construct new infrastructure to manage, curate and serve data to communities. As part of these efforts, NSF will forge new approaches for associated education and training.

One of NSF's awards announced today includes a $10 million award under the Expeditions in Computing program to researchers at the University of California, Berkeley. The team will integrate algorithms, machines, and people to turn data into knowledge and insight. The objective is to develop new scalable machine-learning algorithms and data management tools that can handle large-scale and heterogeneous datasets, novel datacenter-friendly programming models, and an improved computational infrastructure.

NSF's Cyberinfrastructure Framework for 21st Century Science and Engineering, or "CIF21," is core to strategic efforts. CIF21 will foster the development and implementation of the national cyberinfrastructure for researchers in science and engineering to achieve a democratization of data. In the near term, NSF will provide opportunities and platforms for science research projects to develop the appropriate mechanisms, policies and governance structures to make data available within different research communities. In the longer term, what will result is the integration of ground-up efforts, within a larger-scale national framework, for the sharing of data among disciplines and institutions.

The first round of awards made through an NSF geosciences program called EarthCube, under the CIF21 framework, was also announced today. These awards will support the development of community-guided cyberinfrastructure to integrate big data across geosciences and ultimately change how geosciences research is conducted. Integrating data from disparate locations and sources with eclectic structures and formats that has been stored as well as captured in real time, will expedite the delivery of geoscience knowledge.

NSF also announced a $1.4 million award for a focused research group that brings together statisticians and biologists to develop network models and automatic, scalable algorithms and tools to determine protein structures and biological pathways.

And, a $2 million award for a research training group in big data will support training for undergraduates, graduates and postdoctoral fellows to use statistical, graphical and visualization techniques for complex data.

In addition, anticipated cross-disciplinary efforts at NSF include encouraging data citation to increase opportunities for the use and analysis of data sets; participation in an Ideas Lab to explore ways to use big data to enhance teaching and learning effectiveness; and the use of NSF's Integrative Graduate Education and Research Traineeship, or IGERT, mechanism to educate and train researchers in data enabled science and engineering.
Further Reading