big data Archives - Pro-Tek Blog

5 Apr 2016

Role of Data Scientists IN big data

Rising apace with the relatively new technology of big data is the new job title called “Data Scientist” while not tied exclusively to big data projects. The data scientist role complements them because of the increased breadth and depth of data being examined, compared to traditional roles.

What does a data scientist do?

The data scientist will be responsible for designing and implementing processes and layouts for complex, large-scale data sets used for modelling, data mining, and research purposes. The data scientist is also responsible for business case development, planning, coordination and collaboration with various internal and vendor teams, managing the lifecycle of analysis of the project, and interface with business sponsors to provide periodic updates.

A data scientist would be responsible for:
⦁ Extracting data relevant for analysis (by coordinating with developers)
⦁ Developing new analytical methods and tools as required.
⦁ Contributing to data mining architectures, modelling standards, reporting, and data analysis methodologies.
⦁ Suggesting best practices for data mining and analysis services.
⦁ Creating data definitions for new databases or changes to existing ones as needed for analysis.

Big Data:

The term “Big Data”, which has become a buzzword, is a massive volume of structured and unstructured data that cannot process or analysed using traditional processes or tools. There is no exact definition of how big a dataset should be in order or considered as Big Data.

Big Data is also defined by three V’s i.e., Volume, Velocity, and Variety.

Volume: Big data implies enormous volume of data. We currently see the growth in the data storage, as the data is not only the text data, but also in the format of video, music, and large images on social media channels. It is granular nature of data that is unique. It is very common to have Terabytes and Petabytes of the storage system for organizations. As the database increases, the applications and architecture built to support the data need to be evaluated quite often. Sometimes the same data is evaluated with multiple angles even though the original data is same and the new found intelligence creates an explosion of the data.

Velocity: Velocity deals with the fast rate at which data is received and perhaps acted upon. The increase of data and social media explosion have changed how we look at the data. The flow of data is massive and continuous. Now-a-days people rely on social media to update them on the latest happenings. The data movement is almost real-time and the update window has reduced to a fraction of seconds.

Variety: Data can be stored in multiple formats. Big data variety refers to unstructured and semi-structured data types such as text, audio, and abnormality in data. Unstructured data has many of the same requirements as structured data such as summarization, audibility, and privacy. The real world has data in many formats and that is the major challenge we need to overcome with the Big data.

The future of Big Data:

The demand for big data talent and technology is exploding day-by-day. Over the last two years, the investment in big data solutions has been tripled. As our world continues to become more information driven by year over year, industry analysts predict that the big data market will easily expand by another ten times within the next decade. Big data is already proving its value by allowing companies to operate at a new level of intelligence and worldliness.

future of big data

Tag Archives: big data

WHAT PROCESSES WILL ALLOW AWS FOR STORING AND ANALYZING BIG DATA?

What Processes will allow AWS for storing and Analyzing Big Data?

HOW DO YOU UTILIZE AMAZON REDSHIFT FOR THE BIG-DATA PROBLEM?

How do you Utilize Amazon Redshift for the Big-Data Problem?

WHAT ALL DWH TOOLS ARE AVAILABLE TO SUPPORT BIG DATA UPLOADS?

What all DWH tools are available to support Big Data uploads?

ROLE OF DATA SCIENTISTS IN BIG DATA

Role of Data Scientists IN big data

Blog