DataOps Components

“If I have seen a little further it is by standing on the shoulders of Giants.”

Isaac Newton
managed

BigStream – Data Warehouse Edition

A modern data platform needs to be flexible to handle various workloads corresponding to the diverse business objectives, be extensible to support existing and emerging data technologies, and to some extent be future-proof. As Hadoop and Spark are quickly maturing to be scalable, fast, and secure enough to manage multi-structured, enterprise data, Data Warehouses built using these technologies is becoming a reality.

Our BigStream – DW embraces best-of-class components and design patterns to accelerate production deployment  of a modern data warehouse.

BigStream – DW has the following key features:

  • Full support for Kimball methodology including Facts, Dimensions, surrogate keys, and slow changing dimensions.
  • Built on Hadoop distributions – Cloudera CDH and Hortonworks HDP, and Spark.
  • Unified analytics back end – Exploratory, Discovery, and Descriptive analysis through Impala (CDH), Hive (HDP), ElasticSearch, SQL, and REST.
  • Out of the box support for data ingestion from various SQL based transaction systems, social media, email servers, and log files.
  • Extensible data transformation library including text analytics, machine learning & statistical methods to handle missing data, duplicate detection & data corruption.
  • Data security including encryption at rest and in motion, Role based data access.
  • Data governance and compliance technology integration hooks.
  • Designed for scale and high performance
  • Administration and monitoring support with Cloudera Manager, Hortonworks Ambari, and ELK.
  • Can be configured to function as a Data Lake or an MDM by leveraging only HDFS.
  • Deployment paradigms include On-Premise and On-Cloud, with a Managed Services option.

You can now enjoy the benefits of a modern data warehouse in weeks…and not many months!

Want to test drive our BigStream – DW?   Talk to Us

DisKoveror

DisKoveror is a Text Analytics framework developed by Serendio. Built on top of other open source packages, DisKoveror provides a flexible and extensible way to extract Entities, Topics, Categories, Sentiments, and Keywords from unstructured text.

The key advantage of DisKoveror over the numerous open source options is it provides access to the best-of-breed components through a plug and play approach and a unified programming interface.

DisKoveror has also improved the output quality, in some cases, through Training sets, domain specific ontology, and folksonomy.

DisKoveror has been used to mine brand sentiments from social media, understand customer satisfaction from emails, extract topics from Tweets, compute social influence score, computer-assisted metadata and taxonomy creation, and much more.

DisKoveror Highlights

  • Sentiment Analysis
  • Topic Detection
  • Named Entity Recognition
  • Coreference Resolution
  • Keyword Extraction

DisKoveror can be accessed through Java APIs or a RESTful interface  Download

wordcloud
Interested in integrating DisKoveror into your business?   Talk to Us

managed

BigSim

BigSim is designed to provide flexibility and control in generating large data sets through templates and minimal coding. Users just need to provide the data specifications in an XML template defining the semantic type, range, volume, velocity, and shape. These simulated data sets could be used for capacity planning, what-if scenario testing, extrapolate small data sets with certain amount of randomness so as to simulate real-world data sets, fill in missing data in incomplete data sets and such.

  • Designed to generate synthetic data to address hard-to-get data, dirty or missing data, and data for new use cases.
  • Support for Streaming and Batch data generation through extensible data templates.
  • Out-of-the-box templates for Wearables, Smart Homes, Retail, Manufacturing and more.
Interested in trying BigSim?  Talk to Us

Data Wrangling

Data pre-processing is an important step in the data mining process. Data-gathering methods are often loosely controlled, resulting in out-of-range values, impossible data combinations (e.g., Sex: Male, Pregnant: Yes), and missing values.

Analyzing data that has not been carefully screened for such problems can produce misleading results. Thus, the representation and quality of data is first and foremost before running an analysis.

Highlights of PreMod, our Data Pre-Processing Package:

All the above functions are available in R and Python.  Download

wordcloud
Need help with your Data Wrangling problems?     Talk to Us