How Big Data Causes Big Problems for the WAN

The real-time analysis of large amounts of data, that is “big data”, brings competitive advantages to most companies, such as detailed insights into the wishes and purchasing behavior of customers, or up-to-date knowledge of new trends, and in real time. Companies, but also public institutions, can therefore react faster to new developments and exploit market opportunities thanks to big data.

According to a recent study by Microsoft, 75 percent of all medium-sized and large companies want to implement Big Data solutions in the next twelve months. The 62 percent of companies now have data stocks of 100 TB and more.

But Chief Information Officers (CIOs) and IT managers often overlook one of the key factors in starting Big Data initiatives: the high demands placed on transporting such large amounts of data to the underlying network infrastructure. If a company holds a large amount of data on servers and storage systems, this is not really worth it. The real benefit is to examine extensive heterogeneous information inventories from different sources and to be able to take appropriate measures on the basis of the results.

The role of the network infrastructure is insufficiently considered

One of the biggest challenges is overcoming the technical barriers associated with transporting and backing up big data over wide area (WAN) connections. But other key technologies, such as cloud computing, can not leverage their benefits when data is not delivered quickly and efficiently over WAN links. The result: Companies and organizations invest money in applications that ultimately can not reach their full potential.

The fact that IT decision makers underestimate the central role of network infrastructure in connection with big data is also proven by the Microsoft study. IT managers consider implementation of real-time analytics and data mining as the biggest challenge (62 percent) over the next two years. For 58 percent, the expansion of storage infrastructure is a high priority, and 53 percent see solutions as important to evaluate unstructured data. How these data should reach the data centers where they are analyzed, and what that means for LANs and wide area networks, obviously plays a less important role for the surveyed IT professionals.

WAN links become a bottleneck

However, as companies leverage cloud computing offerings or make big data analytics, they realize that the existing WAN links are inadequate. These are simply no match for the growing demands and are becoming a bottleneck.

Insufficient WAN connectivity can negate the benefits that IT managers expect them to gain from consolidating storage resources in a cloud environment or in centralized storage pools. On the other hand, the computing capacity in a company’s data center is generally insufficient to process big data information assets. This would require the acquisition of additional servers and storage systems, as well as the implementation of big data analytics software, including the training of existing IT professionals or the hiring of more professionals. As a result, big data analytics are increasingly becoming specialized service providersoutsourced. These provide big data analytics as part of cloud computing services.

Hurdles for big data initiatives

Many IT professionals are unaware of the central role that network connections, and especially WANs, play in big data, even when they are analyzing such information in their own data centers. IT specialists tend to focus primarily on storing and evaluating big data. How the data reaches the servers and storage systems is often left out.

There are three main challenges in delivering Big Data over Wide Area Networks. First of all, when migrating data , it is important to consider the stability of the links and the large distances that need to be bridged. The further away the data center to which data is to be transported, the higher the latency and the longer the data transfer takes.

The second challenge is too low a bandwidth. It also has a negative effect on the transmission times. Switching to higher capacity WAN lines in many cases turns out to be a dead end because either there are no links with the desired bandwidth available or costs are too high. In addition, in cloud computing environments or MPLS (Multi-Protocol Label Switching) networks, data packets are not transmitted or transmitted in the wrong order when an overload situation occurs.

Any of these factors can mean the end of a big data project or drive up costs. Based on experience from Silver Peak Systems, large companies expand their WAN bandwidth on average every two years. In this way, organizations are taking into account the growth in the amount of data that needs to be delivered and the need for real-time WAN performance .

But implementing WAN links with a larger capacity is time consuming and costly. In addition, more bandwidth does not always eliminate the negative effects of high latency and packet loss rates in applications deployed over wide area networks. In short, companies and organizations that use business applications such as cloud computing, big data, or replicating data in disparate data centers need to be aware of the key role of the network infrastructure.