ROLE OF DATA SCIENTISTS IN BIG DATA

Role of Data Scientists IN big data

Rising apace with the relatively new technology of big data is the new job title called “Data Scientist” while not tied exclusively to big data projects. The data scientist role complements them because of the increased breadth and depth of data being examined, compared to traditional roles.

What does a data scientist do?

The data scientist will be responsible for designing and implementing processes and layouts for complex, large-scale data sets used for modelling, data mining, and research purposes. The data scientist is also responsible for business case development, planning, coordination and collaboration with various internal and vendor teams, managing the lifecycle of analysis of the project, and interface with business sponsors to provide periodic updates.

A data scientist would be responsible for:
⦁ Extracting data relevant for analysis (by coordinating with developers)
⦁ Developing new analytical methods and tools as required.
⦁ Contributing to data mining architectures, modelling standards, reporting, and data analysis methodologies.
⦁ Suggesting best practices for data mining and analysis services.
⦁ Creating data definitions for new databases or changes to existing ones as needed for analysis.

Big Data:

The term “Big Data”, which has become a buzzword, is a massive volume of structured and unstructured data that cannot process or analysed using traditional processes or tools. There is no exact definition of how big a dataset should be in order or considered as Big Data.

Big Data is also defined by three V’s i.e., Volume, Velocity, and Variety.

Volume: Big data implies enormous volume of data. We currently see the growth in the data storage, as the data is not only the text data, but also in the format of video, music, and large images on social media channels. It is granular nature of data that is unique. It is very common to have Terabytes and Petabytes of the storage system for organizations. As the database increases, the applications and architecture built to support the data need to be evaluated quite often. Sometimes the same data is evaluated with multiple angles even though the original data is same and the new found intelligence creates an explosion of the data.

Velocity: Velocity deals with the fast rate at which data is received and perhaps acted upon. The increase of data and social media explosion have changed how we look at the data. The flow of data is massive and continuous. Now-a-days people rely on social media to update them on the latest happenings. The data movement is almost real-time and the update window has reduced to a fraction of seconds.

Variety: Data can be stored in multiple formats. Big data variety refers to unstructured and semi-structured data types such as text, audio, and abnormality in data. Unstructured data has many of the same requirements as structured data such as summarization, audibility, and privacy. The real world has data in many formats and that is the major challenge we need to overcome with the Big data.

The future of Big Data:

The demand for big data talent and technology is exploding day-by-day. Over the last two years, the investment in big data solutions has been tripled. As our world continues to become more information driven by year over year, industry analysts predict that the big data market will easily expand by another ten times within the next decade. Big data is already proving its value by allowing companies to operate at a new level of intelligence and worldliness.

future of big data

USING THE PROTRACTOR AUTOMATION TOOL TO TEST ANGULARJS APPLICATIONS

Using the Protractor Automation Tool to test AngularJS applications

Are you developing an AngularJS application? Are you confused to use Protractor to test for your application or not? If yes, explore this article and know what Protractor is about and the usage of Protractor to test your AngularJS applications.

Talking about Protractor:

Google has released an end-to-end framework for AngularJS applications called Protractor, which works as a solution integrator combining powerful tools and technologies such as NodeJs, Selenium, WebDriver, Jasmine, Cucumber, and Mocha. Protractor has a bunch of customizations from Selenium to easily create tests for AngularJS applications. With Protractor, a developer can write automated tests that run inside an actual browser, against an existing website. Thus, a developer can easily test whether code, end-to-end,  is working as expected or not. The added benefit of using Protractor is that it understands AngularJS and is optimized for it.

“Unit tests are the first line of defense for catching bugs, but sometimes issues come up with integration between components which can’t be captured in an unit test. End-to-end tests are made to find these problems” -Angular Team

Deep diving into Protractor:

Protractor is a framework for automation of functional tests. It means, it’s intention is not to be the only way to test an AngularJS application, but also to cover the acceptance criteria required by the user, basically, End-to-End.

It runs on top of the Selenium, and provides all the benefits and advantages from Selenium. In addition, it also provides customiz able features to test AngularJS applications. It is also possible to use some drivers which implement web drivers wired protocols like ChromeDriver and GhostDriver, as protractor runs on top of the Selenium. In the case of ChromeDriver a developer can run tests without the Selenium server. However, to run GhostDriver one has to use PhantomJS which uses GhostDriver to run tests in Headless mode.

In the past, Protractor’s documentation has been poor and risky due to Protractor’s constant evolution. However, in recent years, the community has collaborated a lot and Protractor’s documentation has been updated.

Salient Features of Protractor:

⦁ It is built on the top of Selenium server.
⦁ Introduced new simple syntaxes to write tests.
⦁ The developer can take advantage of Selenium grid to run multiple browsers.
⦁ A developer can use Jasmine or Mocha to write test suites.

Protractor is built on top of Selenium WebDriver, so it contains every feature that is available in the Selenium WebDriver. Also, Protractor provides some new strategies and functions which are very useful to automate AngularJS applications.

Protractor Installation:

Download and Install NodeJS. After installation make sure that its path is configured correctly, so that command execution can find Node.

Open the Command Prompt and type the following command to install

Protractor globally. npm install –g protractor

Install Protractor Locally

A developer can install Protractor locally in their project directory. Go to the project directory and type the following command in command prompt.

npm install protractor
Now, Verify Installation
To verify installation, type the command
Protractor –version

If Protractor is installed successfully then the system will display the installed version. Otherwise, you will have to recheck the installation.

Let’s see Protractor’s basic example program:
As we know a Protractor needs two files i.e., “spec file” where spec file is test file and “conf file” where conf file is a configuration file.
Below is the sample Test file named “testspec.js”.
describe(‘angularjs homepage’, function() {
it(‘should have a title’, function() {
browser.get(‘http://angularjs.org/’);

expect(browser.getTitle()).toContain(‘AngularJS’);
});
});
The above simple test will navigate to an AngularJS home page and checks for its page title.
Below is the sample config file named as “conf.js”.
exports.config = {
//The address of a running selenium server.
seleniumAddress: ‘http://localhost:4444/wd/hub’,
//Here we specify the name of the specs files.
specs: [‘testspec.js’]

How to run?

Go to the command prompt, type “protractor conf.js”, it will start executing your test in chrome browser by default.

Final word:

Protractor allows a developer to test his AngularJS applications in a consistent and automated way. Today we’re able to make informed statements because of the overall state and soundness of AngularJS applications.

PRIVATE CLOUD VS PUBLIC CLOUD

Private Cloud vs Public Cloud

If you have been researching cloud computing, then you must be aware of Private Vs Public cloud debate. Before you decide which end of the debate you side with, it is important to know the differences between the two technologies. Explore this article and know about the differences before you choose the path.

Private Cloud:

A Private cloud is a distinct and secure cloud-based environment in which only the specified candidate/organization can operate. Compared with other cloud models, private clouds will provide computing power as a service within a virtualized environment using an underlying pool of physical computing resource. However, the private cloud model is only accessible by a single organization providing it with superlative control and privacy. Private cloud offers hosted services to a limited number of people behind a firewall, so it minimizes the security concerns for some organizations.

Private cloud computing, by definition, is a single-tenant environment where the hardware storage and network are dedicated to a single client or organization. The features and benefits of private cloud are therefore as follows:

Security and Privacy: Public cloud services can implement a certain level of security, but a private cloud uses a technique called distinct pools of resources with access denied to connections made from behind one organization’s firewall, devoted leased lines or on-site internal hosting by ensuring that operations are kept out from meddlesome eyes.

Control: Private cloud is only accessible for a single organization, that organization will have the ability to configure and manage it in-line with their needs to achieve customized network solutions.

Cost and Efficiency: Private cloud is not as cost effective as public cloud services due to smaller economies of scale and increased management costs, they do make more efficient computing resources than traditional LAN’s as they minimise the investment in unused capacity.

Hybrid Deployments: If a zealous server is required to run a high-speed database application, that hardware can be integrated into a private cloud, hybridizing the solution between virtual servers and dedicated servers. This can’t be achieved in a public cloud.

To reduce an organization’s on-premises IT footprint, cloud providers, such as Rackspace and VMware, can deploy private cloud infrastructures.

Public Cloud:

The most observable model of cloud computing to many users is the Public cloud model, under which cloud services are provided in a virtualized environment, constructed using pooled shared physical resources, and accessible over a public network such as internet.

Public clouds provides services and access to multiple users using the same shared infrastructure. Amazon (AWS), Microsoft (Azure), VMWare are some of the key players in this space. Public clouds are broadly used by individuals who are less likely to need the level of groundwork and security offered by private clouds. However, users can still utilise public clouds to make their operations significantly more efficient. Even though it possesses security risks, a public cloud is considered more useful than its counterparts because of several reasons.

The following are the features offered by public cloud:

Cost Effective: Initial cost is minimum, but if the data is stored for a very long period of time, it proves to be expensive.

Reliability: There are sheer number of servers and networks involved in creating a public cloud. The major advantage in a public cloud is if one physical compound fails, the cloud still runs unaffected on the remaining components. In other words, there will be no failure which would make a public cloud service vulnerable.

Flexibility: There are multitudinous services available in the market which follow the public cloud model and that are ready to be accessed as a service from any internet enabled devices. These services can fulfil most computing requirements and can deliver their benefits to private and enterprise clients. Businesses can integrate their public cloud services with private cloud services, where they need to perform sensitive business functions, to create hybrid clouds.

Location Independence: The availability of public cloud services through an internet connection ensure that the services are available wherever the client is located. This provides many opportunities to enterprises which has remote access to IT infrastructure or online document collaboration from multiple locations.

The Debate: Despite being different from each other on many factors, it is difficult to say which cloud service stands out. Both the services have equal advantages and disadvantages. Nevertheless, factors concerning security, access patterns, confidentiality, and professional workforce in public and private cloud computing are yet to be enhanced so that the technology proves to be beneficial for establishing and established businesses.

Cloud Bursting: Businesses may also use a combination of a private and public cloud services with hybrid cloud deployment. This allows users to scale computing requirements beyond the private cloud and into the public cloud – a capability called cloud bursting.

WHAT’S THE DIFFERENCE BETWEEN AWS BEANSTALK VS CLOUD FORMATION VS OPSWORK.?

What’s the difference between AWS beanstalk vs Cloud Formation vs Opswork.?

AWS Beanstalk deploy and manage application on AWS Cloud without worrying about the environment to run your web application. No need to create and manage EC2 instance for a single application.

AWS OpsWorks is an application management service or tools that makes it easy for the DevOps user to model & manage their application. It is a tool on AWS Cloud similar to Chef, Puppet which can manage a number of servers.

AWS Cloud Formation helps developers and system administrators by providing easiest way to create and manage a large number of AWS resources, provisioning and updating them in an orderly and predictable fashion. By using cloud Formation in few hours you can build up your large scale environment from a single template.

WHAT’S THE DIFFERENCE BETWEEN HADOOP 1.X AND HADOOP 2.X?

What’s the difference between Hadoop 1.x and Hadoop 2.x?

HDFS federation brings important measures of scalability and reliability to Hadoop. YARN, the other major advance in Hadoop 2, brings significant performance improvements for some applications, supports additional processing models, and implements a more flexible execution engine.

YARN is a resource manager that was created by separating the processing engine and resource management capabilities of MapReduce as it was implemented in Hadoop 1. YARN is often called the operating system of Hadoop because it is responsible for managing and monitoring workloads, maintaining a multi-tenant environment, implementing security controls, and managing high availability features of Hadoop.

Like an operating system on a server, YARN is designed to allow multiple, diverse user applications to run on a multi-tenant platform. In Hadoop 1, users had the option of writing MapReduce programs in Java, in Python, Ruby or other scripting languages using streaming, or using Pig, a data transformation language. Regardless of which method was used, all fundamentally relied on the MapReduce processing model to run.

IN AWS CLOUD, HOW TO LOGIN TO EC2 INSTANCE IF ONE LOSES .PPK FILE OR PASSWORD?

In AWS Cloud, How to login to EC2 instance if one loses .ppk file or password?

Create an AMI for that particular instance whose key was lost and launch a new instance using that AMI. Create a new key pair for this(download the .pem file and use PuttyGen to create a new .ppk file) and download it and now we can start a new instance and check for that key pair to work and then delete the old instance and continue with the new one. For this we have to make sure the new instance is created in the same availability zone as the original one.

HOW IOS SWIFT PROGRAMMING IS GETTING POPULAR

How IOS Swift programming is getting popular

IOS Swift Programming

Swift is a persuasive and intuitive programming language for iOS, OS X, tvOS, and watchOS. Writing up the swift code is interactive and enjoyable, the syntax is brief yet expressive, and applications run at lightning-fast. Swift is ready for your next project – or addition into your current application – because, the Swift code runs parallel with Objective-C.

INTRODUCING SWIFT:

Swift is a new programming language that builds on the best of C and Objective-C, without the constraints of C compatibility. Swift adopts safe programming patterns and has added more modern features to make programming easier, malleable, and more lively. Swift’s clean slate backed by the mature and much-loved Cocoa and Cocoa Touch frameworks, is an opportunity to reimagine how software development works.

Swift took many years in making. Apple laid the foundation for Swift by progressing the existing compiler, and framework infrastructure. Objective-C itself evolved to support blocks, collection literals, and modules, enabling framework adoption of modern language technologies without any interruption.

As Swift is familiar to Objective-C developers, it adopts readability of Objective-C’s named parameters and the power of Objective-C’s dynamic model. Building from the common ground, Swift introduces many new features and consolidates the procedural and object-oriented portions of the language. Another important feature of Swift is, it allows the programmers to experiment with the Swift code and can see the results immediately, without the overhead of building and running an application.

Swift combines the best of modern language thinking with wisdom from the wider Apple engineering culture. The compiler is optimized for performance and the language is optimized for development, without compromising on anyone either.

According to the Red Monk’s Programming Language rankings, Swift has climbed from 68th position in 2014 to 22nd position in 2015, a jump of 46 slots. According to the meteoric rise, Swift is expected to become a top language sometime in this year.

Swift is a fantastic way to write iOS, OS X, watch OS, and tvOS apps, and will continue to advance with new features and capabilities. It has also gone open source, which is one of the main reason for its popularity.

Swift Programming Language
SYNTAX ENHANCEMENT:

The syntax features make the users write more expressive code while improving consistency across the language. The SDK’s have employed new Objective-C features such as, generics and nullability annotation to make Swift code even cleaner and safer.

TYPES, VARIABLES, AND SCOPING:

Under the environment of Cocoa and Cocoa Touch, many classes were part of the foundation kit library. This includes, NSString string library, the NSArray, and the NSDictionary collection classes. Objective-C provides various bits of syntactic sugar to allow some of the objects to be created within the language. But, once it is created the objects are manipulated within the object calls.

NSString *str = @”hello,”;
str = [str stringByAppendingString:@” world”];

In swift, many of these basic types have been promoted to the language’s core and can be manipulated directly. For instance, strings are invisibly bridged to NSString and can be coupled with the “+” operator allowing simplified syntax.

var str = “hello,”
str += ” world”
INTERACTIVE PLAYGROUNDS:

Playgrounds make writing Swift code astonishingly simple and fun. If a user types a line of the code, the result appears immediately, and a user can have quick look of result which is side of the code, or can pin the result directly below. And in new Xcode 7, playground contains comments that use rich text with, bold, italic, and bullet lists in addition to embedded images and links.

DESIGNED FOR SAFETY:

Swift excludes entire classes of unsafe code. The variables are initialized always before use, arrays and integers are checked for overflow, and memory is managed automatically. Another safety feature of Swift is, by default Swift objects can never be nil. The Swift compiler will stop you from trying to make or use of nil object with a compile-time error. This makes the writing of code much cleaner and safer, and prevents massive category of runtime crashes in your applications.

RAPID AND POWERFUL:

From its earlier concept, Swift was built much faster. Using the high-performance LLVM compiler, Swift code is transformed into optimized native code that gets the most out of modern hardware.

PLATFORM SUPPORT:

One of the most impressive aspects of developing Swift is, it is now free to be ported across wide range of platforms, devices, and use cases.

The major goal is to provide source compatibility for Swift across all platforms, even though the actual implementation and mechanisms may differ from one platform to other.

CONCLUSION:

Swift has all the necessary features to quickly become a popular programming language for iOS and OS X in both the enterprise and the consumer worlds. The type interference characteristic of the language will make it especially suitable for the enterprise and the simple and clean syntax will attract those on consumer projects.