Date:21/11/18
Large-scale training on deep learning can lead to instability in large mini-batch training, while gradient synchronization is also burdensome as more bandwidth is required for communication among GPUs. ResNet-50 is a deep residual learning architecture for image recognition that is trained in ImageNet and widely used to measure large-scale cluster computing capability. ImageNet is an open source database for object recognition research. The ImageNet Large Scale Visual Recognition Challenge contains 1,281,167 images for training, 50,000 for validation, and 100,000 for testing.
Sony researchers applied batch size control and 2D-Tours all-reduce to overcome the problems in large-scale training, gradually increasing the total mini-batch size and making the loss landscape flat to avoid local minima. The researchers also proposed communication topology 2D-Tours Ring all-reduce, which can perform collective operations in different orientations. The new topology first performs reduce-scatter horizontally, then performs all-reduce vertically. In the last step, all-gather is performed horizontally. With the help of the 2D-Torus all reduce, communication overhead is less than that of Ring all-reduce.
Researchers used 2176 Tesla V100 GPUs for ResNet-50 training and achieved 75.03 percent validation accuracy; and also tried to improve GPU scaling efficiency without significantly reducing accuracy, achieving 91.62 percent GPU scaling efficiency using 918 Tesla V100 GPUs.
Sony’s cluster topology innovations have dramatically reduced training time, and the potential for further high-performance computing improvement is high. A combination of strong growth in GPU performance, reduction in GPU communication cost, and future cluster topology solutions will likely continue to reduce ResNet-50 training time on ImageNet.
Sony Breaks ResNet-50 Training Record on ImageNet
Researchers from Japanese electronics giant Sony have trained the ResNet-50 neural network model on ImageNet in a record-breaking 224 seconds — 43.4 percent better than the previous fastest time for the benchmark task.Large-scale training on deep learning can lead to instability in large mini-batch training, while gradient synchronization is also burdensome as more bandwidth is required for communication among GPUs. ResNet-50 is a deep residual learning architecture for image recognition that is trained in ImageNet and widely used to measure large-scale cluster computing capability. ImageNet is an open source database for object recognition research. The ImageNet Large Scale Visual Recognition Challenge contains 1,281,167 images for training, 50,000 for validation, and 100,000 for testing.
Sony researchers applied batch size control and 2D-Tours all-reduce to overcome the problems in large-scale training, gradually increasing the total mini-batch size and making the loss landscape flat to avoid local minima. The researchers also proposed communication topology 2D-Tours Ring all-reduce, which can perform collective operations in different orientations. The new topology first performs reduce-scatter horizontally, then performs all-reduce vertically. In the last step, all-gather is performed horizontally. With the help of the 2D-Torus all reduce, communication overhead is less than that of Ring all-reduce.
Researchers used 2176 Tesla V100 GPUs for ResNet-50 training and achieved 75.03 percent validation accuracy; and also tried to improve GPU scaling efficiency without significantly reducing accuracy, achieving 91.62 percent GPU scaling efficiency using 918 Tesla V100 GPUs.
Sony’s cluster topology innovations have dramatically reduced training time, and the potential for further high-performance computing improvement is high. A combination of strong growth in GPU performance, reduction in GPU communication cost, and future cluster topology solutions will likely continue to reduce ResNet-50 training time on ImageNet.
Views: 316
©ictnews.az. All rights reserved.Similar news
- Azerbaijani project to monitor disease via mobile phones
- Innovative educational system to be improved under presidential decree
- NTRC prolongs license of two TV and radio organizations for 6 years
- Azerbaijan establishes e-registry for medicines
- Azerbaijani museum introduces e-guide
- Nar Mobile opens “Nar Dunyasi” sales and service center in Siyazan city
- International conference on custom electronic services held in Baku
- OIC secretary general to attend COMSTECH meeting in Baku
- Azerbaijan develops earthquake warning system
- New law to regulate transition to digital broadcasting in Azerbaijan
- Azerbaijani State Social Protection Fund introduces electronic digital signature
- Intellectual traffic management system in Baku to be commissioned in December
- Tax Ministry of Azerbaijan started receiving video-addresses
- World Bank recommends Azerbaijan to speed up e-service introduction in real estate
- Azerbaijan to shift to electronic registration of real estate