The Challenge of Big Data – It’s more than just big files!

The Challenge of Big Data – It’s more than just big files!

Big Data is a term that creeps up a lot these days and its meaning can be deceptive. Often data is thought of as just “Big Data” when the file size hits a certain size. But in reality the picture is less about size and more about complexity. Big data files do not need to be measured in terabytes, or even gigabytes, but the complexity of data and the inter-relationships between it and the other data sources inside an organisation, can make it more valuable than the raw data alone. Decreasing the time taken to process the information also means decisions can be made quicker or thought about longer and the value of the data increases.

Gartner predicts 2016 seeing big data moving on from the ingestion of data to the automating of the analytics and artificial intelligence (AI) being used to leverage the power of data, but before any of this can happen you need to have the data in the right location and format first.

Big Data Characteristics

Big Data Characteristics – The 4 V’s

Getting Big Data in…

The faster you can get data into your organisation, the sooner it can be analysed. In order to speed up the process of receiving the data into your organisation, several managed file transfer solutions are including proprietary, high-speed file transfer protocols. These are based around UDP streams or parallel TCP connections to increase bandwidth utilisations to over 90%.

The new protocols enhance data transfer rates significantly, enabling gigabytes of data to be delivered in less than a minute across the world, making for some impressive headlines. This technology is still in its early days, meaning than no open protocols offer this increased utilisation, opting for one approach over another is a personal preference.

Getting Big Data sorted…

Getting the data into your organisation is all well and good, but it is only half of the challenge. All the data needs to be analysed before it can become useful, however it arrives in your managed file transfer system from a variety of sources, in different formats and, almost invariably, not the format your central data analysis tool needs.

Two simple enhancements that can increase the efficiency and speed at which you ingest your data; firstly pushing data when it’s ready instead of waiting for it to be collected, and secondly triggering events when the file is received by your managed file transfer system.

Remove another step from the process…

Implementing a managed file transfer solution, which has the ability to stream files to a target server, provides productivity gains over traditional store and forward style workflows. By writing a large data set directly onto the intended target system, you’re able to remove another step in the process.

Once the latency has been pared back, the next stumbling block is getting the data into a useable format. There are literally hundreds of data standards and even the most common of these are often “augmented” with extra data my specific applications.

Integrating some form of data translation, often by post processing scripts or applications is a common approach. This works well until the next upgrade changes the “standard” slightly and the translation script needs to be edited or even re-written. Modern managed file transfer solutions provide the ability to transform data to be presented to the target system in a format that it recognises and can process. These can be simple XLS to XML conversion or much more complex EDI and database translation.

A growing requirement of managed file transfer…

The world of managed file transfer has evolved to enable companies that need to move big data, to do so as efficiently as possible. Streamlining the delivery of data (of varying types, sizes and structures), from external trading partners, onto internal big data analytics solutions, is becoming a much more common requirement from our customers.

If you’ve a Big Data file transfer project and would like our pre-sales and technical experts assistance, contact us here or call +44 (0) 20 7118 9640.

Download a Comparison of 8 Leading Managed File Transfer Solutions!

 

MFT_Comparison Guide Img

In this essential pack you’ll also find…

 

  • Key features and frequently asked questions

  • Other business policies that will need to be considered

  • Access to additional resources

  • Side by side comprehensive comparison

    * Updated to include new vendors (October 2015)

Some Thoughts on TCP Speeds

Some Thoughts on TCP Speeds

As a consultant in File Transfer technologies, a common complaint that I find myself having to address is the speed that a file travels at between two servers. Generally speaking, many people expect that if they have two servers exchanging files on a dedicated traffic-free 1 Gbps line, then their transfer speed should be somewhere close to this.

TCP Vs UDP

One of the first things to consider is the way that TCP works compared to UDP. No matter which protocol is used, data is broken into packets when being sent to the receiving computer. When UDP (User Datagram Protocol) is used, the packets are sent ‘blind’; the transfer continues regardless of whether data is being successfully received or not. This potential loss may result in a corrupted file – in the case of a streamed video this could be some missing frames or out of sync audio, but generally will require a file to be resent in its entirety. The lack of guarantee makes the transfer fast, but unless combined with rigorous error checking (as per several large-file-transfer vendors) it is often unsuitable for data transfers.

In contrast, TCP (Transmission Control Protocol) transfers data in a carefully controlled sequence of packets; as each packet is received at the destination, an acknowledgement is sent back to the sender. If the sender does not receive the acknowledgement in a certain period of time, it simply sends the packet again. To protect the sequence, further packets cannot be sent until the missing package has been successfully transmitted and an acknowledgment received.

Deliverability over speed / Calculating the Bandwidth Delay Product 

This emphasis on guarantee rather than speed brings with it a certain degree of delay however; we can see this by using a simple ping command to establish the round trip time (RTT) – the greater the distance to be covered, the longer the RTT. The RTT can be used to calculate the Bandwidth Delay Product (BDP) which we will need to know when calculating network speeds. BDP is the amount of data ‘in-flight’ and is found by multiplying the Bandwidth by the delay, so a round trip time of 32 milliseconds on a 100Mbps line gives a BDP of 390KB (data in transit).

Window Scaling

The sending and receiving computers have a concept of windows (‘views’ of buffers) which control how many packets may be transmitted before the sender has to stop transfers. The receiver window is the available free space in the receiving buffer; when the buffer becomes full, the sender will stop sending new packets. Historically, the value of the receiver window was set to 64KB as TCP headers used a 16 bit field to communicate the current receive windows size to the sender; however it is now common practice to dynamically increase this value using a process called Window Scaling. Ideally, the Receive Window should be at least equal in size to the BDP.

TCP speed fluctuations

The congestion window is set by the sender and controls the amount of data in flight. The aim of the congestion window is to avoid network overloading; if there are no packets lost during transmission then the congestion window will continually increase over the course of the transfer. However, if packets are lost or the receiver window fills, the congestion window will shrink in size under the assumption that the capacity of either the network or receiver has been reached. This is why you will often see a TCP download increase in speed then suddenly slow again.

TCP Speeds Diagram

A quick calculation…

One point to remember is that when talking about bandwidth, we tend to measure in bits; when referring to storage (window size or BDP) we are measuring in bytes. Similarly, remember to make allowance for 1Mb = 1000Kb, but 1MB=1024KB.

So, given this, a 1Gbps connection with a 60 ms round trip time gives a BDP of 7.15 MB (1000*60/8/1.024/1.024). As I mentioned, to fully utilise the 1Gbps connection, we must increase the Receiver Window to be at least equal to the BDP. The default (non-scaling) value of 64 KB will only give us a throughput of 8.74 Mbps: 64/60*8*1.024 = 8.738Mbps

So what can you do to speed up the transfer?

Logically, you would probably want to have the largest Receive Window possible to allow more bandwidth to be used. Unfortunately, this isn’t always a great idea; assuming that the receiver is unable to process the data as fast as it arrives you may potentially have many packets queued up for processing in the Receiving Buffer – but if any packet has been lost, all subsequent packets in the receive buffer will be discarded and resent by the sender due to the need to process them in sequence.

You also need to consider the abilities of the computer at the other end of the connection – both machines need to be able to support window scaling and selective acknowledgements (as per RFC1323).

Another option that you can investigate is the ability of several products to perform multithreading. Multithreaded transfers theoretically move quicker than single threaded transfers due to the ability to send multiple separate streams of packets; this negates somewhat the delays caused by having to resend packets in the event of loss. However the transfers may still be impacted by full receive windows or disk write speeds; in addition any file that has been sent via multiple threads needs to be reassembled on an arrival, requiring further resources. In general, most large-file transfer software is written around multithreading principles, or a blend of UDP transfer with TCP control.

Finally, consider other network users – when using large Receive Windows, remember that as the amount of data in transit at any time increases, you may encounter network usage spikes or contention between traffic.

 

If you have any questions about the speed of your file transfers or your chosen file transfer technology and infrastructure design give our team of experts a call on 0207 118 9640.

Download a Comparison of 8 Leading Managed File Transfer Solutions!

 

MFT_Comparison Guide Img

In this essential pack you’ll also find…

 

  • Key features and frequently asked questions

  • Other business policies that will need to be considered

  • Access to additional resources

  • Side by side comprehensive comparison

    * Updated to include new vendors (October 2015)

Impact of Brexit on the GDPR

Impact of Brexit on the GDPR

The opening statement of Information Commissioner Sir Christopher Graham’s last annual report talked about “responding to new challenges, and preparing for big changes, particularly in the data protection and privacy field.” Delivering his speech in the early aftermath of Brexit, everyone was keen to get his view on the implications for the roll out of the General Data Protection Regulation (GDPR).

Prior to Brexit

In April of 2016, after two years of debating, the final terms of the European GDPR were agreed. The legislation comes into effect for member states in May 2018 and includes key changes such as:

  • The right to be forgotten
  • New stricter conditions for the adequate protection of file transfers
  • Privacy notices for individuals on how their data is handled
  • Tighter legislation around active consent for processing data
  • And a shared liablity for breaches between data controllers and data processors.

The change that many CIOs will be concerned about is the increase in sanctions for data breach, which have increased to 4% of annual global turnover.

GDPR-reform

Moving forward

When asked about the uncertainty, the Commissioner stated “We now need to consider the impact of the referendum on UK data protection regulation. It is very much the case that the UK has a history of providing legal protection to consumers around their personal data which precedes EU legislation by more than a decade, and goes beyond current EU requirements.” He stressed that “Having clear laws with safeguards in place is more important then ever given the growing digital economy, and we will be speaking to parts of the government to present our view that reform of the UK law remains necessary.”

But will EU GDPR still effect us?

The changes in EU Legislation are due to come into effect in May 2018. As the debate over Article 50 continues, CIOs face on-going uncertainty. However, whether the UK is still a member of the EU or not, the new rules will still apply to many organisations. The newly agreed scope states that the law will apply to non-EU companies that are offering goods and services to EU citizens. Any UK organisation selling in Europe will still need to comply with GDPR.

In closing, the Commissioner reiterated that the ICO would continue to make sure that the current standard of excellence remains intact. “We must maintain the confidence of businesses and of consumers. The ICO stands ready to enforce the rules that remain and make the case for the highest standards going forward.”

Whatever the law is called, data protection is not going away.

If you’re unsure how any of the current or upcoming data protection legislation effects your businesses’ file transfer requirements give our team of experts a call on 0207 118 9640.

Download a Comparison of 8 Leading Managed File Transfer Solutions!

 

MFT_Comparison Guide Img

In this essential pack you’ll also find…

 

  • Key features and frequently asked questions

  • Other business policies that will need to be considered

  • Access to additional resources

  • Side by side comprehensive comparison

    * Updated to include new vendors (October 2015)

Managed File Transfer Versus Middleware

Managed File Transfer Versus Middleware

Managed File Transfer and Middleware have both undergone a period of evolution in the past few years. Historically speaking, the early days of both can be easily traced back to the need to move data between various parts of a computer network, generally over simple protocols like FTP or RCP. As a consequence and especially as organisations began to move away from legacy environments, many networks contained an inordinate number of FTP servers, frequently with an unknown array of FTP clients pushing and pulling data in an often uncontrolled fashion.

Middleware stands up…

This became a standard argument for switching to using a middleware product – taking back control of your network and the data that crossed it. Most early middleware systems used a hub and spoke affair and provided a central point where all data would arrive and depart from. Additionally, the notion of data transformation during transit became popular, rather than the more traditional manipulation during processing at source or target system. A ‘code once use many times’ approach appeared for interfaces, allowing for a reduction in development costs, and the only limitation appeared to be the ever-growing range of available connectors.

The beginning of MFT…

FTP servers didn’t go away however; instead organisations began to centralise their FTP sites and a newer smarter generation of FTP server software began to appear. These early versions of Managed File Transfer quickly developed a common set of standard features – encryption, automation, protocol support and user management.

Which is which?

As both middleware and Managed file transfer systems matured, the boundaries between them began to diminish somewhat, with Managed file transfer performing some middleware functions and vice versa. Now we have reached a point where the practical differences have become a little fuzzy, however it shouldn’t be impossible to follow some simple guidelines to decide upon whether an architect should be following a middleware or a managed file transfer approach.

A good starting point is data transformation. Traditionally this falls squarely within the realms of middleware; however there are Managed file transfer solutions which can offer this feature well enough to be considered. In contrast, most middleware does not provide an FTP interface for end-users, relying instead on web services for input or FTP clients for output. An organisation therefore has to review its requirements – do they need an Managed file transfer solution with some middleware functionality, or middleware with some Managed file transfer?

Managed File Transfer Functionality

Middleware Functionality

While trying to avoid generalisations, here are some things to consider that Managed File Transfer solutions provide and middleware ones generally don’t (or at least not well):

  • Enterprise File Sync and Share – the process of sharing data by sending a hyperlink via email is not well supported by middleware
  • Large File Transfer – Very large files are not suitable for transformation and therefore are not often considered by middleware vendors
  • File repository – Managed file transfer systems normally provide a repository of data for download, often encrypted
  • Home folder management – mostly, if a middleware system permits users to have home folders, these have to be manually created
  • Development and Deployment – on the whole, managed file transfer allows for faster design and rollout of interfaces than middleware, which often requires full development teams

Conversely, Middleware can provide functionality that Managed file transfer often struggles with, for example:

  • Mapping, database lookups and transformation – middleware supports complex mapping operations, either custom or using internationally recognised templates.
  • Customisable interfaces – middleware provides a framework for development, meaning bespoke designs can be implemented.
  • Peer-to-peer relationships – generally only available in specialised Managed file transfer products (using agents for example), peer-to- peer interfaces are becoming more popular, especially when making use of cloud technology.
  • Adapter support – most middleware products provide adapters which allow connections to just about any kind of system. Managed file transfer systems are generally limited to a handful of transfer protocols
  • Realtime support – with the exception of AS2 transfers, most MFT transfer products are not well suited to synchronous transfers, whereas middleware will generally handle synchronous transfers without problem

In Summary

When considering simple automation, large file transfers or user initiated transfer, Managed File Transfer is better suited than middleware. When looking to introduce complicated interfaces, message transformations or realtime processing, consider using middleware.

The best solution however must come when there is a symbiosis of the two; traffic passes through a Managed file transfer system and is handled by the middleware product. From an automation perspective the flexibility of Managed File Transfer represents a tactical solution, whilst more persistent interfaces are developed using middleware.

If your company is considering implementing a system for securely exchanging data and integrating it into your internal network, you’ll need to know whether the features you require are provided by the leading managed file transfer solutions, or middleware systems. Download our free managed file transfer comparison guide, which provides an ‘at a glance’ list of features and much more:

Download a Comparison of 8 Leading Managed File Transfer Solutions!

 

MFT_Comparison Guide Img

In this essential pack you’ll also find…

 

  • Key features and frequently asked questions

  • Other business policies that will need to be considered

  • Access to additional resources

  • Side by side comprehensive comparison

    * Updated to include new vendors (October 2015)

The Pitfalls of Using IIS as an Internet Facing FTP Service

The Pitfalls of Using IIS as an Internet Facing FTP Service

microsoft-iisIIS (Internet Information Services) is the Microsoft product integral to Windows which provides web, email and FTP services. Many organisations make use of the FTP server component to transfer files to application servers inside their networks, relying on more dedicated secure file transfer servers for their public FTP services.

IIS as an SSL secured FTP server…

With the introduction of IIS 7.5, it became possible to use IIS as an SSL secured FTP server – until this time, IIS only ran in non-secure mode. Each successive version of IIS has increased its functionality to bring it closer to mainstream products; this raises the question of why not use it as a free alternative to more costly File Transfer Systems?

No SFTP though…

Let’s first consider the security that IIS provides; IIS allows the use of SSL encryption up to 128 bits. This is sufficient to meet most compliance criteria, however at this time it is not easy to improve upon without serious technical tinkering.
IIS FTP cannot provide (nor is it ever likely to) an FTP platform supporting SSH transfers. SFTP (SSH File Transfer Protocol) is a binary protocol using keys rather than certificates, and was developed on and for Unix. SFTP is very popular for internet based file transfers thanks in part to it being firewall friendly (only requiring a single port to function)ftp-server

Possible administration headaches…

Files and folders hosted by IIS FTP are protected by granting limited permissions to users, groups or roles via the IIS Manager (Read, Write, Nothing). The IIS manager does not contain the users or groups however – these are normal local or domain entities. This unfortunately means administration through two separate systems, increasing administration overheads. IIS does not check for existence of users or groups, which can lead to administration headaches from unintentional typos.

Permissions that are set are inherited down from server to site to folder; you can break inheritance and overwrite with your own permissions at each level, however you cannot restore the inheritance without losing all of your changes.

You cannot find all of the permissions set for a single user without checking every folder or exporting the configuration; unlike a more sophisticated solution, it is not possible to simply “grant the same access as Bob has”.

Administration of IIS FTP sites cannot be delegated or devolved, meaning either all administration is centralised, or else everyone has the ability to make any changes they desire in IIS. Configuration changes are logged to the event log (if enabled), however you will require a separate tool to generate reports from the event log.

On the subject of logging, an area of concern is the logging of regular FTP activities. This is done through standard W3C logging in the same way as a website would be logged.

 

2016-04-22 09:49:26 127.0.0.1 – 127.0.0.1 10020 USER FTPUser 331 0 0 1b3083fe-67a9-461d-926f-795946f267d8 –

2016-04-22 09:49:31 127.0.0.1 RichardWin7\FTPUser 127.0.0.1 10020 PASS *** 230 0 0 1b3083fe-67a9-461d-926f-795946f267d8 /

2016-04-22 09:49:38 127.0.0.1 RichardWin7\FTPUser 127.0.0.1 10020 CWD transfer 250 0 0 1b3083fe-67a9-461d-926f-795946f267d8 /transfer

2016-04-22 09:49:43 127.0.0.1 RichardWin7\FTPUser 127.0.0.1 10020 PORT 127,0,0,1,189,113 200 0 0 1b3083fe-67a9-461d-926f-795946f267d8 –

2016-04-22 09:49:43 127.0.0.1 RichardWin7\FTPUser 127.0.0.1 10019 DataChannelOpened – – 0 0 1b3083fe-67a9-461d-926f-795946f267d8 –

2016-04-22 09:49:43 127.0.0.1 RichardWin7\FTPUser 127.0.0.1 10019 DataChannelClosed – – 5 1 1b3083fe-67a9-461d-926f-795946f267d8 –

2016-04-22 09:49:43 127.0.0.1 RichardWin7\FTPUser 127.0.0.1 10020 STOR test.txt 550 5 1 1b3083fe-67a9-461d-926f-795946f267d8 /transfer/test.txt

 

Unfortunately, you cannot select the logging level – only the fields to be recorded into the log file. This means that effectively you will always run the FTP site at full logging.

In the same way as permissions are handled, IP Restrictions are placed at the server level and filter down to site level, then the folder/virtual folder level. Similarly, any changes made to inherited restrictions cannot be reverted on a granular level – it’s all or nothing.

IIS offers the notion of “user isolation”…

Finally, IIS offers the notion of “User Isolation”. This allows you to switch between having all users share a common root folder, or have an individual folder per user. Isolating users means locking a user into a single directory (and any subdirectories); the user can never reach someone else’s folder. Unless you select to use the users home directory as specified in Active Directory however, you will still have to manually create the home directory before the user can log on.

So realistically, should you use IIS FTPS sites for secure access from the internet?

The answer is…it depends on your needs. If you only have a very small number of files being transferred by a very small number of users who can all use FTPS, then it makes sense to use it.
If your user base is likely to grow into double figures, your external users need to use SFTP or audit reports are a necessity, then you really need to steer away from IIS. The overheads for administration and support of IIS FTP very quickly offset the cost savings gained from selecting IIS as a secure public facing file transfer platform.

Download a Comparison of 8 Leading Managed File Transfer Solutions!

 

MFT_Comparison Guide Img

In this essential pack you’ll also find…

 

  • Key features and frequently asked questions

  • Other business policies that will need to be considered

  • Access to additional resources

  • Side by side comprehensive comparison

    * Updated to include new vendors (October 2015)

Leading UK Pharmacy Centralises & Automates Data Transfer Requirements

Leading UK Pharmacy Centralises & Automates Data Transfer Requirements

well-logo

Well, the UK’s largest independent pharmacy with 800 stores, 7000 employees and 73 million prescriptions issued per year, faced a dilemma. They had just been acquired by Bestway Group from the Co-Operative Group and needed to continue to seamlessly exchange business critical data with customers and suppliers after their split.

With a considerable list of challenges that needed to be addressed, Well’s team approached the independent managed file transfer experts, Pro2col.

Well wanted to simplify and better manage their automated and manual data transfer, especially those through financial accounting systems.

Key requirements also included:

1. Data management efficiency through one secure, centralised platform for enhanced visibility and control
2. File transfer integration within their environment for business critical applications AND between third-party applications with partners, vendors, or suppliers exchanging data
3. Automation of financial accounting data to save time, improve security and increase accuracy
4. Full audit and reporting for improved diagnostics
5. Rapid deployment of new transfers on receipt of business requirements

To read the case study in full, learn more about Well’s challenges, and find out which solution they selected, download the case study today.

 
Case_Study Button_Dark

“We worked closely with Well’s IT Project Manager to clarify which configuration best met his immediate requirements, whilst ensuring the solution could grow to address their future strategic direction. Globalscape’s scalability is a great fit for customers with evolving needs.”

James Lewis

Managing Director, Pro2col Ltd

“In the case of major organizational changes, like Well’s acquisition by Bestway Group, old IT ecosystems may not be suited to handle future challenges brought upon by a new injection of resources and processes. However, by bringing on technology from Globalscape with the help of technology partners like Pro2col, Well is able to manage their data seamlessly and handle any potential requirements that arise as the business continues to scale now and in the future.”

Matt Goulet

Chief Operating Officer, Globalscape Inc.