Fun With Big Data Using the Microsoft “Data Explorer” Pt. 1
Microsoft has released a new Community Tech Preview of a new tool for working with “Big Data”. Just what is Big Data ? Well simply it’s compiled statistical information that is out and available in on the internet in the computing cloud.. There is a vast amount of data available today and data is now being collected and stored at a rate never seen before. Much, if not most, of this data however is locked into specific applications or formats and difficult to access or to integrate into new uses. “Data Explorer as a tool allows you to start exploring these sources
Data comes from a number of different sources out there including:
SQL databases, Web Page Content (including RSS feeds), XML formatted metadata sources such as OData feeds, SharePoint Repositories and others..
Windows Azure Data Market Place
The Windows Azure™ Marketplace is an online market buying, and selling finished Software as a Service (SaaS) applications and premium datasets. The Windows Azure Marketplace helps connect companies seeking innovative cloud based solutions with partners who have developed solutions that are ready to use.
Every Microsoft SharePoint list and library in a site has a corresponding data source connection in the Data Source Library. To add a SharePoint list or library to the Data Source Library, you can either create a new list or library or create a new connection to an existing list or library.
Any SharePoint lists or libraries that you create will also automatically have a corresponding data source connection in the Data Source Library.
The Open Data Protocol (OData) is a Web protocol for querying and updating data that provides a way to unlock your data and free it from silos that exist in applications today. OData does this by applying and building upon Web technologies such as HTTP, Atom Publishing Protocol (AtomPub) and JSON to provide access to information from a variety of applications, services, and stores. The protocol emerged from experiences implementing AtomPub clients and servers in a variety of products over the past several years. OData is being used to expose and access information from a variety of sources including, but not limited to, relational databases, file systems, content management systems and traditional Web sites.
SQL was initially developed at IBM by Donald D. Chamberlin and Raymond F. Boyce in the early 1970s. Structured Query Language) is a programming language designed for managing data in relational database management systems (RDBMS). SQL databases have been the standard since they were invented.
Let’s walk through a short sample of connecting to a data source. I will choose Netflix’s OData as a source, to make this example fun.
First press plus: at the top menu to create a new mash-up..
in the dialog box we will type in the name for our new mashup and name it “NetflixMashup”
next we will add our data from the Netflix OData server.. Clicking on the “Data Feed” icon will allow us to create our new data source..
Our next step will be to add our NetFlix Feed URL
For this example I will use one of the feeds that are available as a top level resource, in this case the Netflix Catalog Titles http://odata.netflix.com/Catalog/
Next it will ask us how to connect to the feed.. We enter the feed URL
Add the URL to the mashup workflow wizard and click ‘Done’
If the feed requires windows authentication, a name and password, or an OData feed key you will have an opportunity to enter it to set feed security options. Since the listing we are connecting to is public and has none of these we will press ‘continue’ to connect with it, leaving the ‘Use anonymous access’ option making sure that radio button is selected.
When the Data Feed is successfully parsed we will see the feed with the formatting schema.. then we can click ‘Done’ to continue.
Removing fields we won’t use
When the fields have been parsed on the data field we can right click on the fields and select “remove fields” on all of the the ones we won’t use and then click the ‘done’ button.
Next we are going to select the fields we are going to use to gather the data we are using.. For this demonstration I am going to just select ‘Titles’.
Now we can click on “more tools..
This will expose more menus..
Click or Double click on the Select Fields Icon/Toolbar and the screen should change..
Check the checkboxes for fields to include (I am selecting all of them) and click ‘Done’.
To Be Continued…
In part two we will output to a table and look at some results and then finish up with a look at an example using the Azure Data Marketplace with data.gov and do some statistical analysis..
For another look at using this product check out Lynn Langit’s Blog post on using Data Explorer on her blog..