a3kconsultancy.com: May 2015

Wednesday, May 20, 2015

How to get last runned cron job details in linux?

/var/log/cron contains cron job logs. But you need a root privilege to see.

Friday, May 8, 2015

Big Data!

Big data is nothing but an assortment of such huge and complex data that becomes very tedious to capture, store, process, retrieve and analyze it. Thanks to on-hand database management tools or traditional data processing techniques, things have become easier now. In fact, the concept of “BIG DATA” may vary from company to company depending upon its size, capacity, competence, human resource, techniques and so on. For some companies it may be a cumbersome job to manage a few gigabytes and for others it may be some terabytes creating a hassle in the entire organization.

The Four V’s Of Big Data

1. Volume: BIG DATA is clearly determined by its volume. It could amount to hundreds of terabytes or even petabytes of information. For instance, 15 terabytes of Facebook posts or 400 billion annual medical records could mean Big Data!

2. Velocity: Velocity means the rate at which data is flowing in the companies. Big data requires fast processing. Time factor plays a very crucial role in several organizations. For instance, processing 2 million records at share market or evaluating results of millions of students applied for competitive exams could mean Big Data!

3. Variety: Big Data may not belong to a specific format. It could be in any form such as structured, unstructured, text, images, audio, video, log files, emails, simulations, 3D models, etc. New research shows that a substantial amount of an organization’s data is not numeric; however, such data is equally important for decision-making process. So, organizations need to think beyond stock records, documents, personnel files, finances, etc.

4. Veracity: Veracity refers to the uncertainty of data available. Data available can sometimes get messy and maybe difficult to trust. With many forms of big data, quality and accuracy are difficult to control like the Twitter posts with hash tags, abbreviations, typos and colloquial speech. But big data and analytics technology now permits to work with these types of data. The volumes often make up for the lack of quality or accuracy. Due to uncertainty of data, 1 in 3 business leaders don’t trust the information they use to make decisions.

Why Big Data analysis is crucial:

1. Just like labor and capital, data has become one of the factors of production in almost all the industries.

2. Big data can unveil some really useful and crucial information which can change decision making process entirely to a more fruitful one.

3. Big data makes customer segmentation easier and more visible, enabling the companies to focus on more profitable and loyal customers.

4. Big data can be an important criterion to decide upon the next line of products and services required by the future customers. Thus, companies can follow proactive approach at every step.

5. The way in which big data is explored and used can directly impact the growth and development of the organizations and give a tough competition to others in the row! Data driven strategies are soon becoming the latest trend at the Management level

Why Hadoop?

Hadoop can be contagious. It’s implementation in one organization can lead to another one elsewhere. Thanks to Hadoop being robust and cost-effective, handling humongous data seems much easier now. The ability to include HIVE in an EMR workflow is yet another awesome point. It’s incredibly easy to boot up a cluster, install HIVE, and be doing simple SQL analytics in no time. Let’s take a look at why Hadoop can be so incredible.

Key features that answer – Why Hadoop?

1. Flexible:

As it is a known fact that only 20% of data in organizations is structured, and the rest is all unstructured, it is very crucial to manage unstructured data which goes unattended. Hadoop manages different types of Big Data, whether structured or unstructured, encoded or formatted, or any other type of data and makes it useful for decision making process. Moreover, Hadoop is simple, relevant and schema-less! Though Hadoop generally supports Java Programming, any programming language can be used in Hadoop with the help of the MapReduce technique. Though Hadoop works best on Windows and Linux, it can also work on other operating systems like BSD and OS X.

2. Scalable

Hadoop is a scalable platform, in the sense that new nodes can be easily added in the system as and when required without altering the data formats, how data is loaded, how programs are written, or even without modifying the existing applications. Hadoop is an open source platform and runs on industry-standard hardware. Moreover, Hadoop is also fault tolerant – this means, even if a node gets lost or goes out of service, the system automatically reallocates work to another location of the data and continues processing as if nothing had happened!

3. Building more efficient data economy:

Hadoop has revolutionized the processing and analysis of big data world across. Till now, organizations were worrying about how to manage the non-stop data overflowing in their systems. Hadoop is more like a “Dam”, which is harnessing the flow of unlimited amount of data and generating a lot of power in the form of relevant information. Hadoop has changed the economics of storing and evaluating data entirely!

4. Robust Ecosystem:

Hadoop has a very robust and a rich ecosystem that is well suited to meet the analytical needs of developers, web start-ups and other organizations. Hadoop Ecosystem consists of various related projects such as MapReduce, Hive, HBase, Zookeeper, HCatalog, Apache Pig, which make Hadoop very competent to deliver a broad spectrum of services.

5. Hadoop is getting more “Real-Time”!

Did you ever wonder how to stream information into a cluster and analyze it in real time? Hadoop has the answer for it. Yes, Hadoop’s competencies are getting more and more real-time. Hadoop also provides a standard approach to a wide set of APIs for big data analytics comprising MapReduce, query languages and database access, and so on.

6. Cost Effective:

Loaded with such great features, the icing on the cake is that Hadoop generates cost benefits by bringing massively parallel computing to commodity servers, resulting in a substantial reduction in the cost per terabyte of storage, which in turn makes it reasonable to model all your data. The basic idea behind Hadoop is to perform cost-effective data analysis present across world wide web!

7. Upcoming Technologies using Hadoop:

With reinforcing its capabilities, Hadoop is leading to phenomenal technical advancements. For instance, HBase will soon become a vital Platform for Blob Stores (Binary Large Objects) and for Lightweight OLTP (Online Transaction Processing). Hadoop has also begun serving as a strong foundation for new-school graph and NoSQL databases, and better versions of relational databases.

8. Hadoop is getting cloudy!

Hadoop is getting cloudier! In fact, cloud computing and Hadoop are synchronizing in several organizations to manage Big Data. Hadoop will become one of the most required apps for cloud computing. This is evident from the number of Hadoop clusters offered by cloud vendors in various businesses. Thus, Hadoop will reside in the cloud soon!

POKE ME for any consultancy