Rabitə və İnformasiya Texnologiyaları Nazirliyinin elektron xəbər xidməti
Science body reveals how it manages big data
At a time when enterprises are struggling to capture, store and analyse the vast quantities of data generated today, the UK's Science and Technology Facilities Council (STFC) processes and categorises over 10TB of data daily.Servicing universities and research councils as well as national and international projects including CERN's Large Hadron Collider (LHC), STFC is able to store and categorise volumes of data most large businesses would baulk at. David Corney, group leader, petabyte storage group at STFC, described the technology employed to accomplish this. "We have two SD8500 robots from Oracle Storage Tek. These can manage the storage of around 100 petabytes on disk." These robots automatically select disks on which new data coming in is recorded, and are able to find any disk in its archive on request.
"In front of those robots we have about seven petabytes of spinning disk. Data comes in, and depending on the type of experiment being recorded it either goes to tape directly, or straight to disk if they're not overly worried about losing it." The LHC generates petabytes of data per second, most of which is discarded, for capacity reasons. While experiments are being run, around 10TB of data come to STFC from the LHC over its dual dedicated twin 4GB/s fibre, which links directly to CERN. This data would be unmanagable were it not categorised as it enters the organisation. This is done automatically by a huge metadata catalogue called ICAT. ICAT is an open-source categorisation system developed over many years, which at STFC, runs on an Oracle database.
It divides data into various divisions and subdivisions, which scientists can then search for and analyse. Corney explains that scientists would be unable to find the parts of the data they need without automatic categorisation, given the volumes they work with. "The whole pipeline is frighteningly huge in many respects. We need to capture all the data, store it, catalogue it, and make it available to end users so they can not only take it away on their USB stick, but come back in 10 years and ask for it again."
27/07/11 Çap et