Distance no object for big data in the cloud
Steve Jones from IBM Cloud – Aspera explains trends in high-speed transport of big data in the cloud and why it’s so important.
Whatever their sector, these organisations often deal with data sets measuring in tens of terabytes or even in petabytes. And therein lies the problem: How do they quickly and securely migrate files to the cloud to get big data in and out of the cloud service provider’s datacentre?
Traditional file-sharing tools such as FTP and HTTP were never designed for this kind of task. Performance seriously degrades over distance, as network latency kicks in. A few numbers show the problem isn’t going to go away:
- Cisco’s annual Visual Network Index predicts that video will account for a whopping 82% of all Internet Protocol (IP) traffic by 2021.
- 85% of managers with responsibility for data storage in healthcare and life sciences plan to increase cloud compute resources, but 72% think IT infrastructure will create research bottlenecks.
- 95% of healthcare and life sciences organisations say their existing file collaboration tools don’t meet their needs.
So, what’s the answer?
It may come as a surprise in this day and age, but some big-data-in-the-cloud users are still ‘hand-carrying’ data by shipping hard disk drives to their cloud providers. In taking this route, they run all the risks of introducing time-lags into projects and suffering loss from theft or mishandling. What’s more, there is still the problem of moving the data within the cloud, between remote storage and across compute nodes.
To realise the promise of the cloud, these organisations need the means to transport large volumes of data securely, at high speed to, from and across cloud infrastructures.
Some, such as the hugely successful Netflix (approaching 104 million subscribers worldwide!), have already gained the capability to shorten file transfer times from many hours to mere minutes.
UDP and TCP
To do this, you don’t necessarily need to be using a TCP based transfer protocol like FTP, but can use alternative protocols based on UDP instead.
UDP allows you to move big datasets much more quickly than TCP, regardless of size, distance or network conditions. Unlike TCP, which only sends subsequent data packets once the previous packet is received, UDP continually sends data packets without waiting to see if they are received successfully. This is significantly faster, allowing businesses to speed up processes and operate more efficiently.
In traditional UDP transfers, there are some trade-offs. If a data packet doesn’t reach its destination it is ultimately lost. For example, in a live video stream the sound might distort slightly or a few frames may freeze. As a consequence, in certain circumstances (like video streaming), the advantages greatly outweigh this.
Where maintaining the integrity of the data is critical however, the best results are gained from a combination of both UDP and TCP. UDP is used to transfer the data, whilst TCP is used to ensure that any packets lost are resent. An agent at the receiving station is used to reconstruct the data after the transfer.
Some examples of effective use of this method include:
- Banking, where vast amounts of data is captured and moved around each day.
- Researchers needing to share large volumes of scientific and clinical research data.
- The manufacturing industry, sharing large files and data sets to global development teams.
Find out more about Aspera’s combined UDP and TCP technology – FASP – on the Aspera vendor page. You can also download the Aspera White Paper ‘Taking Big Data to the Cloud’.
This is the latest in a series of guest blog posts from the leading vendors, highlighting how a file transfer solution can add value to your organisation.