While setting up the Big Data technical environment, one of the questions which most enterprise grapple with is whether to go for an appliance or a cluster. A Big Data appliance can be defined as an integrated system which provides a combination of hardware, software, storage and network device for enabling big data use cases. A Big Data cluster on the other hand can be defined as a combination of exclusive nodes with required hardware, big data processing software, coupled storage and can be integrated together via network devices.
While appliances are usually known to involve a large payout to the vendor, comparative studies have tried to prove that the Total Cost of Ownership (TCO) may in certain cases be less or equal to a cluster setup. Let’s take a look at whether the appliances are worth the money spent.
|- Higher initial payout||- Lower initial payout – with a chance to acquire new resources as you scale out|
|- Standard configuration across nodes||- Provision to mix and match configurations based on distinct need for name node or data nodes|
|- High probability of vendor lock in||- More liberty in terms of switching vendors and associated software and components|
|- Field tested Hadoop and ecosystem projects version offered as package||- Need to make difficult component choices and version compatibility tests|
|-Lower set up time and enablement||-Higher setup time and labor effort|
|- Eliminates learning curve for administrators on each component||-Need high comfort level and education on required components|
|- Could have issues in installing add on software||- Flexibility in terms of installing additional software|
|- New hardware investment||- Offers possibility of leveraging existing hardware|
|- Need to read the fine line in contract on software upgrade and pricing||- Better control on software upgrade and pricing|
|- Additional scaling capabilities could lead to technical and pricing challenges||- More flexibility on additional scaling capability|
|- Will need to stick to SQL standard offered by vendor||- Can choose your own preferred SQL on Hadoop solution|
|- Lesser hard work required for restoration of node with common support subscription||- Could involve following and coordination among multiple vendors for trouble-shooting|
|- May involve migration costs||- May not involve any major migration cost since you could add up additional nodes on the cluster|
Recommended steps to arrive at decision:
- Collect use cases, associated data volume and growth projections
- Determine the Hadoop/Big data ecosystem layers that you will invest in next 3 years.
- Analyze software, hardware components being offered vis-à-vis requirements as listed out in steps 1 and 2 above
- Perform benchmark tests (if required skills are available)
- Compare metrics across appliances of different vendors and cluster machines with varied configuration
- Arrive at qualitative and quantitative comparison across the options to help you choose a winner.