Tag Archives: big data


What Processes will allow AWS for storing and Analyzing Big Data?

The following services are described in order from collecting, processing, storing and analyzing big data:

– Amazon Kinesis Streams
– AWS Lambda
– Amazon Elastic MapReduce
– Amazon Machine Learning
– Amazon Dynamo DB
– Amazon Redshift
– Amazon Elastic Search Service
– Amazon Quick Sight

In addition, Amazon EC2 instances are also available for self-managed big data applications.


How do you Utilize Amazon Redshift for the Big-Data Problem?

Redshift is a peta-scale data distribution center (it can likewise begin with giga-scale), that lies on Ansi SQL interface. As you can put as much data as you like into the DWH and you can run any sort of SQL you wish against this data, this is a decent framework to construct any Agile and big data analysis framework. Redshift has numerous examination capacities, for the most part utilizing Window capacities. You can calculate averages and medians, and also percentiles, dense rank etc.



What all DWH tools are available to support Big Data uploads?

With respect to, there are many DWH and reporting tools that you can associate with Redshift. The most widely recognized ones are Tableau, QlikView, Looker or YellowFin, particularly on the off chance that you don’t have any current DWH, where you might need to continue utilizing devices like Jasper Soft or Oracle BI.



Role of Data Scientists IN big data

Rising apace with the relatively new technology of big data is the new job title called “Data Scientist” while not tied exclusively to big data projects. The data scientist role complements them because of the increased breadth and depth of data being examined, compared to traditional roles.

What does a data scientist do?

The data scientist will be responsible for designing and implementing processes and layouts for complex, large-scale data sets used for modelling, data mining, and research purposes. The data scientist is also responsible for business case development, planning, coordination and collaboration with various internal and vendor teams, managing the lifecycle of analysis of the project, and interface with business sponsors to provide periodic updates.

A data scientist would be responsible for:
⦁ Extracting data relevant for analysis (by coordinating with developers)
⦁ Developing new analytical methods and tools as required.
⦁ Contributing to data mining architectures, modelling standards, reporting, and data analysis methodologies.
⦁ Suggesting best practices for data mining and analysis services.
⦁ Creating data definitions for new databases or changes to existing ones as needed for analysis.

Big Data:

The term “Big Data”, which has become a buzzword, is a massive volume of structured and unstructured data that cannot process or analysed using traditional processes or tools. There is no exact definition of how big a dataset should be in order or considered as Big Data.

Big Data is also defined by three V’s i.e., Volume, Velocity, and Variety.

Volume: Big data implies enormous volume of data. We currently see the growth in the data storage, as the data is not only the text data, but also in the format of video, music, and large images on social media channels. It is granular nature of data that is unique. It is very common to have Terabytes and Petabytes of the storage system for organizations. As the database increases, the applications and architecture built to support the data need to be evaluated quite often. Sometimes the same data is evaluated with multiple angles even though the original data is same and the new found intelligence creates an explosion of the data.

Velocity: Velocity deals with the fast rate at which data is received and perhaps acted upon. The increase of data and social media explosion have changed how we look at the data. The flow of data is massive and continuous. Now-a-days people rely on social media to update them on the latest happenings. The data movement is almost real-time and the update window has reduced to a fraction of seconds.

Variety: Data can be stored in multiple formats. Big data variety refers to unstructured and semi-structured data types such as text, audio, and abnormality in data. Unstructured data has many of the same requirements as structured data such as summarization, audibility, and privacy. The real world has data in many formats and that is the major challenge we need to overcome with the Big data.

The future of Big Data:

The demand for big data talent and technology is exploding day-by-day. Over the last two years, the investment in big data solutions has been tripled. As our world continues to become more information driven by year over year, industry analysts predict that the big data market will easily expand by another ten times within the next decade. Big data is already proving its value by allowing companies to operate at a new level of intelligence and worldliness.

future of big data