The Challenge of Big Data – It’s more than just big files!

The Challenge of Big Data – It’s more than just big files!

Big Data is a term that creeps up a lot these days and its meaning can be deceptive. Often data is thought of as just “Big Data” when the file size hits a certain size. But in reality the picture is less about size and more about complexity. Big data files do not need to be measured in terabytes, or even gigabytes, but the complexity of data and the inter-relationships between it and the other data sources inside an organisation, can make it more valuable than the raw data alone. Decreasing the time taken to process the information also means decisions can be made quicker or thought about longer and the value of the data increases.

Gartner predicts 2016 seeing big data moving on from the ingestion of data to the automating of the analytics and artificial intelligence (AI) being used to leverage the power of data, but before any of this can happen you need to have the data in the right location and format first.

Big Data Characteristics

Big Data Characteristics – The 4 V’s

Getting Big Data in…

The faster you can get data into your organisation, the sooner it can be analysed. In order to speed up the process of receiving the data into your organisation, several managed file transfer solutions are including proprietary, high-speed file transfer protocols. These are based around UDP streams or parallel TCP connections to increase bandwidth utilisations to over 90%.

The new protocols enhance data transfer rates significantly, enabling gigabytes of data to be delivered in less than a minute across the world, making for some impressive headlines. This technology is still in its early days, meaning than no open protocols offer this increased utilisation, opting for one approach over another is a personal preference.

Getting Big Data sorted…

Getting the data into your organisation is all well and good, but it is only half of the challenge. All the data needs to be analysed before it can become useful, however it arrives in your managed file transfer system from a variety of sources, in different formats and, almost invariably, not the format your central data analysis tool needs.

Two simple enhancements that can increase the efficiency and speed at which you ingest your data; firstly pushing data when it’s ready instead of waiting for it to be collected, and secondly triggering events when the file is received by your managed file transfer system.

Remove another step from the process…

Implementing a managed file transfer solution, which has the ability to stream files to a target server, provides productivity gains over traditional store and forward style workflows. By writing a large data set directly onto the intended target system, you’re able to remove another step in the process.

Once the latency has been pared back, the next stumbling block is getting the data into a useable format. There are literally hundreds of data standards and even the most common of these are often “augmented” with extra data my specific applications.

Integrating some form of data translation, often by post processing scripts or applications is a common approach. This works well until the next upgrade changes the “standard” slightly and the translation script needs to be edited or even re-written. Modern managed file transfer solutions provide the ability to transform data to be presented to the target system in a format that it recognises and can process. These can be simple XLS to XML conversion or much more complex EDI and database translation.

A growing requirement of managed file transfer…

The world of managed file transfer has evolved to enable companies that need to move big data, to do so as efficiently as possible. Streamlining the delivery of data (of varying types, sizes and structures), from external trading partners, onto internal big data analytics solutions, is becoming a much more common requirement from our customers.

If you’ve a Big Data file transfer project and would like our pre-sales and technical experts assistance, contact us here or call +44 (0) 20 7118 9640.

Download a Comparison of 8 Leading Managed File Transfer Solutions!

 

MFT_Comparison Guide Img

In this essential pack you’ll also find…

 

  • Key features and frequently asked questions

  • Other business policies that will need to be considered

  • Access to additional resources

  • Side by side comprehensive comparison

    * Updated to include new vendors (October 2015)

Some Thoughts on TCP Speeds

Some Thoughts on TCP Speeds

As a consultant in File Transfer technologies, a common complaint that I find myself having to address is the speed that a file travels at between two servers. Generally speaking, many people expect that if they have two servers exchanging files on a dedicated traffic-free 1 Gbps line, then their transfer speed should be somewhere close to this.

TCP Vs UDP

One of the first things to consider is the way that TCP works compared to UDP. No matter which protocol is used, data is broken into packets when being sent to the receiving computer. When UDP (User Datagram Protocol) is used, the packets are sent ‘blind’; the transfer continues regardless of whether data is being successfully received or not. This potential loss may result in a corrupted file – in the case of a streamed video this could be some missing frames or out of sync audio, but generally will require a file to be resent in its entirety. The lack of guarantee makes the transfer fast, but unless combined with rigorous error checking (as per several large-file-transfer vendors) it is often unsuitable for data transfers.

In contrast, TCP (Transmission Control Protocol) transfers data in a carefully controlled sequence of packets; as each packet is received at the destination, an acknowledgement is sent back to the sender. If the sender does not receive the acknowledgement in a certain period of time, it simply sends the packet again. To protect the sequence, further packets cannot be sent until the missing package has been successfully transmitted and an acknowledgment received.

Deliverability over speed / Calculating the Bandwidth Delay Product 

This emphasis on guarantee rather than speed brings with it a certain degree of delay however; we can see this by using a simple ping command to establish the round trip time (RTT) – the greater the distance to be covered, the longer the RTT. The RTT can be used to calculate the Bandwidth Delay Product (BDP) which we will need to know when calculating network speeds. BDP is the amount of data ‘in-flight’ and is found by multiplying the Bandwidth by the delay, so a round trip time of 32 milliseconds on a 100Mbps line gives a BDP of 390KB (data in transit).

Window Scaling

The sending and receiving computers have a concept of windows (‘views’ of buffers) which control how many packets may be transmitted before the sender has to stop transfers. The receiver window is the available free space in the receiving buffer; when the buffer becomes full, the sender will stop sending new packets. Historically, the value of the receiver window was set to 64KB as TCP headers used a 16 bit field to communicate the current receive windows size to the sender; however it is now common practice to dynamically increase this value using a process called Window Scaling. Ideally, the Receive Window should be at least equal in size to the BDP.

TCP speed fluctuations

The congestion window is set by the sender and controls the amount of data in flight. The aim of the congestion window is to avoid network overloading; if there are no packets lost during transmission then the congestion window will continually increase over the course of the transfer. However, if packets are lost or the receiver window fills, the congestion window will shrink in size under the assumption that the capacity of either the network or receiver has been reached. This is why you will often see a TCP download increase in speed then suddenly slow again.

TCP Speeds Diagram

A quick calculation…

One point to remember is that when talking about bandwidth, we tend to measure in bits; when referring to storage (window size or BDP) we are measuring in bytes. Similarly, remember to make allowance for 1Mb = 1000Kb, but 1MB=1024KB.

So, given this, a 1Gbps connection with a 60 ms round trip time gives a BDP of 7.15 MB (1000*60/8/1.024/1.024). As I mentioned, to fully utilise the 1Gbps connection, we must increase the Receiver Window to be at least equal to the BDP. The default (non-scaling) value of 64 KB will only give us a throughput of 8.74 Mbps: 64/60*8*1.024 = 8.738Mbps

So what can you do to speed up the transfer?

Logically, you would probably want to have the largest Receive Window possible to allow more bandwidth to be used. Unfortunately, this isn’t always a great idea; assuming that the receiver is unable to process the data as fast as it arrives you may potentially have many packets queued up for processing in the Receiving Buffer – but if any packet has been lost, all subsequent packets in the receive buffer will be discarded and resent by the sender due to the need to process them in sequence.

You also need to consider the abilities of the computer at the other end of the connection – both machines need to be able to support window scaling and selective acknowledgements (as per RFC1323).

Another option that you can investigate is the ability of several products to perform multithreading. Multithreaded transfers theoretically move quicker than single threaded transfers due to the ability to send multiple separate streams of packets; this negates somewhat the delays caused by having to resend packets in the event of loss. However the transfers may still be impacted by full receive windows or disk write speeds; in addition any file that has been sent via multiple threads needs to be reassembled on an arrival, requiring further resources. In general, most large-file transfer software is written around multithreading principles, or a blend of UDP transfer with TCP control.

Finally, consider other network users – when using large Receive Windows, remember that as the amount of data in transit at any time increases, you may encounter network usage spikes or contention between traffic.

 

If you have any questions about the speed of your file transfers or your chosen file transfer technology and infrastructure design give our team of experts a call on 0207 118 9640.

Download a Comparison of 8 Leading Managed File Transfer Solutions!

 

MFT_Comparison Guide Img

In this essential pack you’ll also find…

 

  • Key features and frequently asked questions

  • Other business policies that will need to be considered

  • Access to additional resources

  • Side by side comprehensive comparison

    * Updated to include new vendors (October 2015)

Webinar – Prepare for The Future of Managed File Transfer

Webinar – Prepare for The Future of Managed File Transfer

Event Type: Live Webinar
Event Date: Thursday 29th June 2016
Event Time: 4PM – 4.30PM

You didn’t mean for it to happen, but your company has for years deployed multiple disparate managed file transfer solutions to support its daily B2B and A2A processes, as well as the various other systems that enable reliable data transformation, certificate management, and multi-protocol communications.

Integrating current data requirements already is overwhelming, but also juggling newly emerging integration use cases, including ground-to-cloud, cloud-to-cloud, big data ingestion, large file transfer, data transformation, and high-speed data transfer protocols? That’s something else entirely.

Webinar Presenters

FKFrank Kenny – Independent MFT Guru and Industry Expert

A former Gartner analyst with more than 15 years experience, Frank Kenny is an expert on managed file transfer and critical business information integration, leading MFT strategy in a variety of executive-level roles.


jtJohn Thielens – 
Chief Technology Officer, Cleo

With more than 30 years of experience in software development, John is responsible for crafting technology strategy, and innovation, and architecting enterprise integration solutions to solve complex integration challenges.

Get advice on modernising and prepping for the future of managed file transfer.

JSCAPE MFT Server 9.3 Released

JSCAPE MFT Server 9.3 Released

jscape_mft_server_9.3

JSCAPE is pleased to announce the release of MFT Server 9.3.  This is a major release and includes several important enhancements highlights of which include:

  • Global Datastore – All configuration data has been migrated from file based storage to a relational database.  Configuration data may be stored in the local database that is included with the product or stored in a centralized database for high availability and clustering purposes.
  • JMS (Java Message Service) – Trigger events may be optionally published to a JMS queue for event processing.
  • Administrative ACL – Roles and Tags modules have been added providing for fine grained administrative permissions, limiting access to domains, modules within domains and data within modules.
  • Custom Administrative Authentication – Added ability to define custom authentication logic for administrative users.
  • Administrative Logging – All administrative actions are now logged in a separate logging interface dedicated to administrative activity logs.
  • RADIUS authentication – Added support for authenticating users using RADIUS server.
  • Minor enhancements and bug fixes.

If you are an existing customer and would like to upgrade to the latest version or are interested in finding out more about JSCAPE MFT please contact us on 0207 1189640.

Globalscape Webinar – Top Four Ways that MFT Benefits Healthcare Organisations

Globalscape Webinar – Top Four Ways that MFT Benefits Healthcare Organisations

Organisations within the healthcare industry face intense scrutiny from regulators and consumers in today’s modern data landscape. Given the high premium placed on patients’ medical records, hackers and cybercriminals are looking for any and every vulnerability to exploitThrough a Managed File Transfer (MFT) solution, IT professionals within the healthcare industry can gain the advantage.

Join us on this webinar for a strategic insight that you can apply to your healthcare organisation today!

Webinar Overview

The healthcare industry faces the immense challenge of preventing continuous data breaches from attempts by hackers or through human error. While the prevention of data breaches continues to be a priority for any IT professional managing the network and data of a healthcare organisation, ensuring operational efficiency and productivity is equally important.

A managed file transfer (MFT) solution can help an IT professional implement a data management strategy that includes a focus on efficiency, reliability, security, and compliance for all data activity.

You’ll leave this webinar with:

  • Current healthcare industry challenges with data movement
  • The top four ways that a healthcare organisation can benefit from a MFT solution
  • How Globalscape supports the healthcare industry

Event Type: Live Webinar

Event Date: Wednesday 21st January – 1200Hrs CST

Review this Webinar