Sunday, April 13, 2014

Social Media Mining with R by Richard Heimann and Nathan Danneman, Packt Publishing Book Review

Social Media Mining with R

What an engrossing read!
I must admit I fret at the beginning, how I can embark on the most advanced and seemingly difficult topics as R, Social Media, together? But somewhere early in chapter 2 I relaxed, thanks to Richard and Nathan who delivered the not so familiar (to me) content gradually and with much aplomb. I liked the R primer in same chapter.
Getting Twitter data turned to be a breeze as you will see in chapter 3. Chapters 4 and 5 are not exactly technical, for example they expand on the nature of sentiments, social behaviour, and mentioned a few pitfalls, however to my surprise, I enjoyed them a lot. Most importantly, these two chapters serve as a base to the rest of the book in terms of a model on which the analysis is going to be conducted. Chapter 6 is where you work hard, but not too hard than it would make you put the book away and shut your computer down, rather it was fun full of algorithms, graphics and cool insight!

I finished this book in no time, but wish it was longer. Certainly, the authors are the ones I will be looking for to buy more books from.

Like I said, this is a somewhat a short book, but it covers what it promises very well, for those who wish to expand further the authors provide a list of related literature.

I think a contractor wishing to deliver a social analysis assignment fast should not look any further. And one can sure expand further than extracting tweets. I trust the principals and techniques remain almost the same.

In terms of my closing notes, the reader needs to be familiar with Git[Hub], some or no R and better running a 64 bit OS, preferably Linux or Mac (mainly because these OSes already come with tools as CURL). The book publisher site is http://www.packtpub.com/social-media-....

Oh, and the book rating by me is 5 out of 5.

Sunday, February 16, 2014

Persistence in PHP with Doctrine ORM by Kévin Dunglas, Packt Publishing Book Review

Persistence in PHP with Doctrine ORM

In short, Persistence in PHP with Doctrine ORM is a book about harnessing the power of Doctrine ORM, but also:

  • Will let you build a working project using Doctrine involving advanced techniques;
  • Will guard you from doing mistakes early in your getting known Doctrine, and
  • Will advocate on appropriate technologies to use in addition in case your next creation is going to see the world.

I would recommend this book to a developer who worked with PHP already, but getting ready to embark onto a more intensive data processing endeavor.

The book is not terribly long, yet comprehensive enough to allow a person to become proficient in Doctrine say overnight (yes, my Kindle app estimated my reading speed at ~ 2.5 Hrs, that is without me experimenting with code). Personally, I value concise books because they give me a push and allow a relatively comfortable solo sailing with an occasional exploration of a topic I did not encounter learning but stumbled upon doing real-life work.

As an aside, use of an ORM (not just Doctrine specifically) is typically being perceived as a negative phenomenon by the data people (disclaimer: I am the data person), unsurprisingly Kevin mentions performance implications under the so called Big Data scenario. This has it’s grounds, I agree, as for example tuning DML or data retrieval, so let’s not argue here, but at least one aspect on an ORM not possible to beat - is its ability to allow seamless transition from one database platform to another, relatively uncommon in the past, seems to being picking up nowadays. But do not be overly optimistic, no migration is ever smooth, it just alleviates some of the pains and minimizes the costs of engineering and maintaining your software.

On the not so bright side the author does not cover executing stored procedures/packages, and apparently Doctrine (as most OSS projects) has a long list of defects, yeah, I can hear you, this is a book review, nothing else, ditto.

So one last interesting discovery, the Doctrine community focuses more and more on a NoSQL, Mongo in particular, which is thrilling.

You will find information in the book on how to build your own SQL, implement association, inheritance and even a not so often used many-to-many relationships.

On the odd note, I saw a circular reference created in one of the book examples, while possible it is very dangerous! Also the book covers only one approach: building your app code-first: meaning the database schema is created after a class, which I (you know who am I Smile) don’t endorse, alas I am / was new to Doctrine.

I suggest Kevin adds to ver. 2.0 of this book the following:

  1. Building an application the schema-up way, too, and
  2. Provide an example where Doctrine is using a Mongo database.

I give this book a 5 out of 5 rating because it has achieve its objectives, however it seems that Packt could give it the “Instant” moniker due to its material coverage.

Thursday, February 13, 2014

Creating Reports with SQL Server 2012 Reporting Services Course by Packt Publishing Review

Creating Reports with SQL Server 2012 Reporting Services

I confess, it was my first time reviewing a video from a publisher. Yet produced by a Ph. D! I was thrilled yet overwhelmed by the feeling of an enormous responsibility.

Un-surprisingly, the end result is superb. The video covers enough ground for a newcomer to the Reporting Services world to quickly become productive and deliver a stunning dashboard.

Overall, I like the idea of delivering educational content visually. It is much faster to make yourself familiar with software in the shortest possible time this way than reading a book. Especially if it is about visualizations.

I liked Dr. Dallas Srinder’s voice, it is clear and monotone, not too fast and not too slow with no time wasted on secondary items. The length of each topic is sometimes too short, but it does allow experimentation on your own.

Big kudos to Dr. Dallas Srinder!

5 out of 5.

You can purchase the course here: http://bit.ly/1a0NpEJ

Wednesday, July 13, 2011

Cascading IF-ELSE-ELSEIF A.K.A. Nested IF-ELSE Statements in a SSIS Expression

Today it will be a concise post, but I hope quite helpful and informative.

As most SSIS developers know SSIS Expressions are used to drive several key tidbits of a package components as the Conditional Split, Package Variables value assignments, Derived Column Transformation, form a SQL statement for Execute SQL task, drive precedence constraints, set package properties as connection string for example and more. So it is hard to underestimate the importance of the SSIS Expressions for a SSIS developer.

However, the SSIS expressions look hard to use for some beginners yet, feel un-natural to most people. This fact does not undermine the power and flexibility of the SSIS Expression. The only hard to swallow portion in my opinion is the code bloat versa editing space ratio that limits the effectiveness of writing long expressions. Luckily, there is a free tool to help: SSIS Expression Editor & Tester! I do not see any dramatic improvements in regard to the SSIS Expression in the forthcoming SQL Server “Denali”, but it will add a number of features that made me excited, perhaps I shall dedicate a separate post on this subject, but I will mention one: Expression Task. This task will help to assign a result of an expression to a variable. So the SSIS Expressions are here to stay and important to know.

And now I will share the most commonly asked question on MSDN SSIS forum: “how do I handle or process multiple IF-ELSE conditions”?

The answer is it is simple!

As we know, the conditional statement in SSIS Expression has the following notation:

«boolean_test» ? «true_result» : «false_result»




The trick is the right part can accept yet another [nested] IF-ELSE, and then more:


@[User::MyStringVar] == "TEST1" ? "IT WAS TEST1": 
@[User::MyStringVar] == "TEST2" ? "IT WAS TEST2": 
@[User::MyStringVar] == "TEST3" ? "IT WAS TEST3":
"MATCH NOT FOUND"




The end result of running this expression is it will check the MyStringVar contents and try to pick the first match, if the match not found it will produce “Match not found”.


This expression is good for a variable assignment, for a Derived Column transformation one should use:


[MyStringVar] == "TEST1" ? "IT WAS TEST1": 
[MyStringVar] == "TEST2" ? "IT WAS TEST2": 
[MyStringVar] == "TEST3" ? "IT WAS TEST3":
"MATCH NOT FOUND"



It is easy to expand this expression further to more choices by just adding lines before the last assignment.

Tuesday, June 28, 2011

How to capture a Header or Trailer Count Value in a Flat File and Assign to a Variable

Recently I had several questions concerning how to process files that carry a header and trailer in them. Typically those files are a product of data extract from non Microsoft products e.g. Oracle database encompassing various tables data where every row starts with an identifier. For example such a file data record could look like:

HDR,INTF_01,OUT,TEST,3/9/2011 11:23

B1,121156789,DATA TEST DATA,2011-03-09 10:00:00,Y,TEST 18 10:00:44,2011-07-18 10:00:44,Y
B2,TEST DATA,2011-03-18 10:00:44,Y
B3,LEG 1 TEST DATA,TRAN TEST,N

B4,LEG 2 TEST DATA,TRAN TEST,Y

FTR,4,TEST END,3/9/2011 11:27

A developer is normally able to break the records using a Conditional Split Transformation component by employing an expression similar to

Output1 --  SUBSTRING(Output1,1,2) == "B1"

and so on, but often a verification is required after this step to check if the number of data records read corresponds to the number specified in the trailer record of the file.


This portion sometimes stumbles some people so I decided to share what I came up with.


As an aside, I want to mention that the approach I use is slightly more portable than some others I saw because I use a separate DFT that can be copied and pasted into a new SSIS package designer surface or re-used within the same package again and it can survive several trailer/footer records (!).


See how a ready DFT can look:


DFT_Composition


The first step is to create a Flat File Connection Manager and make sure you get the row split into columns like this:


FFS_Columns_Value FFSE_Look


After you are done with the Flat File connection, move onto adding an aggregate which is in use to simply assign a value to a variable (here the aggregate is used to handle the possibility of multiple footers/headers):




Aggr_Editor


The next step is adding a Script Transformation as destination that requires very little coding.


First, some variable setup:


Script_Setup


and finally the code:


Script_Code



As you can see it is important to place your code into the appropriate routine in the script, otherwise the end result may not be as expected.


As the last step you would use the regular Script Component to compare the variable value obtained from the DFT above to a package variable value obtained say via a Row Count component to determine if the file being processed has the right number of rows.

Thursday, November 25, 2010

Age this month calculation in Crystal Reports

I keep having lots of fun developing enterprise reports with the excellent Crystal Reports.

Today I needed to calculate the age of a person for this month. The typical use of this formula would be something like “determine if an employee is turning 65 years old this month).

I was able to come with the following Crystal Formula Syntax:

// Age this month calculation 
WhileReadingRecords;
dateTimeVar _1stOftheMonth := CurrentDate;
dateTimeVar LastOftheMonth := CurrentDate;

// Get the run date and make it the 1st day of the current month
_1stOftheMonth := Date(Year(_1stOftheMonth), Month(_1stOftheMonth), 01);

// Get the next month's 1 st day. Subtract 1 day to get the last day of the current month
LastOftheMonth := DateAdd( "d", -1, DateAdd("m", 1, _1stOftheMonth) );

// Proceed with the age calculation taking into consideration the leap year
if (Month(LastOftheMonth) * 100) + Day(LastOftheMonth) >=
(Month(CDate({Employee_Master.BirthDate})) * 100) + Day(CDate({Employee_Master.BirthDate}))
then
Year (LastOftheMonth) - Year(CDate({Employee_Master.BirthDate}))
else
Year (LastOftheMonth) - Year(CDate({Employee_Master.BirthDate})) - 1;



An interesting nuance about this formula: if one substitutes the CurrentDate (on line 3 and 4)  to an event date of interest (being a certain date or a database field) it is possible to calculate the age on this specific date!

Sunday, November 7, 2010

Free SQL IDEs Review – SQL Everywhere vs. Toad for SQL Server

image

Recently I have been given a short term assignment to develop a number of reports in Crystal Reports.

Upon arriving to the client location I quickly found out that the only tool I will be using to create the reports is basically the Crystal Reports itself.

My initial look at the database quickly revealed that the database structure is far from perfect and that it has some confusing table relationships, another hurdle was in the fact that several columns in various tables were user customizable for values to hold without any hint onto where the value is used and the relationship to other database objects.

I definitely had a challenge figuring out what and how the right SQL queries should be developed. To my dismay the matters were marred even further by my account being read-only so the possibility of creating database diagrams to visualize the data was not feasible.

After some short peering into what links Crystal Reports produces I realized I am lucking a tool that can close the gap.

Just as an aside, every morning I am normally reading a few blogs, one of them I literally read while having my first cup of coffee is an excellent one by Chris Alcock The Morning Brew in which it was just not long ago mentioned about the free tools for SQL Server by Aaron Bertrand. Bingo! I thought, I need to take them for a test drive! Not knowing anything about the tools I decided to download both.

A short wait for the downloads and Toad for SQL Server 5.0 gets installed first. Surprisingly, the free license is only for a few people in an organization (clever!). Well, it impressed me with a clean interface, but cluttered with panes and panels it quickly made me a little slow navigating here and there, my first 5 min and I think these are the most critical ones for the software spent and I see something – Object Explorer:

image

this must be the one I murmured to myself that has the ability to find an object based on its name. Nice, but it turned out to work only as a filter on the top level objects, so for example if a database is highlighted then applying a filter it will only show objects matching the criterion. Not particularly useful. At least for me, I needed to find where a specific column name is used. Hmm, let’s give SQL Everywhere a try! OK, it is installed in a snap! My first ehh, seconds, I find in View the Object Browser, good, search for my objects, whoa! I got a large set of results to go through! Nice, I got my columns, tables, indexes, functions and what not listed, gosh, even the T-SQL for this object as to be added is shown:

image

Cool, and besides, in addition a streamlined version is available through the Objects tab on the Explorer pane:

image

Double-clicking on the column name takes to it in a given table.

Good now we are ready to write some useful SQL queries.

Well, I did not want to give up yet on Toad for SQL Server and went on to writing a complex join. So I am typing in SELE… hmm, I notice the intellisense does not pop up with suggestions, well all right, after all I know how to type, now seems it does not support suggesting on keywords, or it may be my impression. Now when I arrived to the join part to another table the intellisense does show up and tries to guess my join, and to my amusement the choice at the top actually makes sense. Not bad! Now I try to do the same in SQL Everywhere and and start with typing SELE… wow wait what is going on? Ahm three choices appeared: scf, ssf and st100, I hit tab and get nice! Similarly ssf expands to and respectively.

Not a bad productivity improvement for free!

OK, now some ugly T-SQL Code is churned I would be ashamed to copy and paste to a Command in Crystal Reports, let’s beatify it first! OK, I still love Toad, ahm, where SQL Format or something like that is? It is there but disabled, well, turned out it is disabled in SQL Everywhere too, pity! No problem, I can that even without installing a single piece of software, going to http://www.sqlinform.com/online.phtml and whoa I have my long SQL looking good!

I thought I deserve to rest and I started to play around with the settings, surprisingly, both editors offered a lot of customization, Toad even made me laugh by offering a pumpkin skin and a Valentine’s skin:

image 

What a jolly amphibian Smile!

(I guess the creators of it thought a developer would change its appearance based on kind of a holiday approaching).

SQL Everywhere sports a more humble approach to its appearance and to me it looks more tasteful and more importantly, practical. These are simply color schemes.

As far as the productivity goes, both tools offer a good code snippets support, though SQL Everywhere has a more functional set of snippets, one thing, it is only very short.

Now let’s highlight some interesting differences. The first that comes to mind is how the tools handle the connection to the database. While SQL Everywhere offers an on-demand connection (default) thus unloading the server, Toad seems to not to offer any customization of this kind. Others are “As early as possible” and “On First Use”.

Another nifty feature is “Execute Here” (if you click on a database name). It is particularly useful if you want to run the same SQL statement against one database and the another, for this purpose I guess it offers data set comparison features but it was not available in the free editions, I guess it would be as easy to do by dumping the results to a file and compare using say ExamDiff.

For database administrators viewing active processes in SQL Everywhere (by pressing F2)

image

is just so simple it blows the mind compare to how you get this in SSMS, sure this feature is gonna be loved!

In closing, I would highlight some features I found lacking or not to may satisfactory:

First, I am bothered a lot by both IDEs not able to output the results of a query to text. However it is easily compensated by the “export to csv” functionality.

Second, is inability to edit data right in the grid in SQL Everywhere.

Third, while SQL Everywhere offered search within the grid with results, none of this appeared working in Toad (I am curious if this does work in the paid version of the product).

Speaking of the prices, Toad’s Pro version (the cheapest option) is $595 USD vs. $199 for SQL Everywhere, a big leap for not so much more I am tempted to say, also curious if this is the result of the so called “Bran Recognition” paradox marketing?

Well, all in all, both tools are a very good compliment to the SSMS and I hope will wet your appetite for exploring them more and contributing to your bottom line!

As it stands now, my first choice is SQL Everywhere for more flexible, practical and streamlined (e.g. macros support) database development experience.