tag:blogger.com,1999:blog-36589303.post4624770148145284411..comments2024-01-22T09:48:10.802+01:00Comments on Nihil Obstat: Baseline Sentiment Analysis with WEKAJose Maria Gomez Hidalgohttp://www.blogger.com/profile/17053588779560658723noreply@blogger.comBlogger13125tag:blogger.com,1999:blog-36589303.post-7329758103554757672016-11-23T18:22:37.365+01:002016-11-23T18:22:37.365+01:00Dear Rajat
I am afraid your question is not very ...Dear Rajat<br /><br />I am afraid your question is not very specific; it is hard to guess what is happenning without knowing if you are using the Explorer (I guess). My bet is that you have not selected the appropriate attribute as the class in the Classify tab.<br /><br />Please, can you be more specific?<br /><br />RegardsJose Maria Gomez Hidalgohttps://www.blogger.com/profile/17053588779560658723noreply@blogger.comtag:blogger.com,1999:blog-36589303.post-57428075911487953382016-11-23T17:01:44.451+01:002016-11-23T17:01:44.451+01:00I am new to this field and using the data with 200...I am new to this field and using the data with 200 pos and 200 neg words. Now when I feed them to weka some of the options are disable. I don't why only ZeroR algorithm is working. Can you please tell me whats the error.Test V1https://www.blogger.com/profile/15362164798150294598noreply@blogger.comtag:blogger.com,1999:blog-36589303.post-21029247778058418992015-08-22T18:56:00.231+02:002015-08-22T18:56:00.231+02:00Hello, thanks for the tutorial its been great help...Hello, thanks for the tutorial its been great help. I implemented the NB in a java class and now I am trying to query it. My problem is no matter with what dataset I train it with, or how I query it, The first 4 inputs are always 0.0 and the rest are 1.0.<br /><br />So my question is how do I use the NB for classification?<br /><br />I am using this code to query it:<br /><br /> Instances b = source.getDataSet();<br /> b.setClassIndex(b.numAttributes() - 1);<br /> NaiveBayes nb = (NaiveBayes) bsa;<br /> int i = 0;<br /> double pred = 0;<br /> for (; i < b.numInstances(); i++) {<br /> pred = nb.classifyInstance(b.instance(i));<br /> System.out.print("ID: " + b.instance(i).value(0));<br /> System.out.println(", predicted: " + b.classAttribute().value((int) pred));<br /> }<br /><br />"bsa" is where I keep the trained BN model.<br /><br />and this is the format of the instances I am asking to classify (I have had it classify the training set and it again gives 4 yes 396 no).<br /><br />@relation C__Users_NG_Dropbox_Tutors_Emotion<br /><br />@attribute text string<br />@attribute @@class@@ {yes,no}<br /><br />@data<br />'some text',yes<br />'some other text',no<br /><br />I have thought that maybe the instances for querying need to be in the already filtered format. If so how do I do this? because the word count wont match between new queries and the old training data set.<br /><br />thanks for the help and for the post.NGhttps://www.blogger.com/profile/14338072312819859248noreply@blogger.comtag:blogger.com,1999:blog-36589303.post-38551355364105444112015-01-09T09:55:59.747+01:002015-01-09T09:55:59.747+01:00Dear Jatin
It is the same regex, but in the first...Dear Jatin<br /><br />It is the same regex, but in the first case it is less escaped as it is in a command line with less nested calls, while in the second one, there are more nested calls so you need to escape it more.<br /><br />In order to master escaping in this calls, I recommend configuring commands at the Explorer and then copying the configuration to scripts, as I explain in this other post: http://jmgomezhidalgo.blogspot.com.es/2014/05/weka-text-mining-trick-copying-options.html<br /><br />Regards<br /><br />JM Jose Maria Gomez Hidalgohttps://www.blogger.com/profile/17053588779560658723noreply@blogger.comtag:blogger.com,1999:blog-36589303.post-83999206710893133132014-12-11T09:52:09.845+01:002014-12-11T09:52:09.845+01:00Hello,
I would like to thank you for this wonderf...Hello,<br /><br />I would like to thank you for this wonderful post.<br />I would like to know more about the regex that you have used. During data analysis, to get n-gram arff files, you have used [ \"\\\\W\" ] as the regex. But later on you have used [ \\\\\\\"\\\\\\\W\\\\\\\" ] as the regex. Can you please explain it in a little more detail.<br /><br />Regards,<br />Jatin.Jatin Mistryhttps://www.blogger.com/profile/08170393887569735817noreply@blogger.comtag:blogger.com,1999:blog-36589303.post-58492524959384288522014-10-13T11:54:23.369+02:002014-10-13T11:54:23.369+02:00Hi Apicio
Yes you can, as long as you use the sam...Hi Apicio<br /><br />Yes you can, as long as you use the same tags at your test set as the ones used in the Pang training set. For instance, you must use {POS, NEU, NEG} in both sets. Of course, you will get good accuracy if the genre and language are the same in the training and test sets - for instance, it makes no sense to train on Pang's dataset to classify or test tweets in Spanish.<br /><br />Good luck and regardsJose Maria Gomez Hidalgohttps://www.blogger.com/profile/17053588779560658723noreply@blogger.comtag:blogger.com,1999:blog-36589303.post-84713017669233276452014-10-12T17:56:49.885+02:002014-10-12T17:56:49.885+02:00Hi Jose, thanks for you answers and you sharing of...Hi Jose, thanks for you answers and you sharing of knowledge.<br />I've another (I think simple) question: may I use an already compiled tranining set to train or I've tu use a traning set of mine? I would use the PangLi's training set, and I want to apply the classifier trained with that training set on testset of mine.<br /><br />Best regardas.<br />Apiciohttps://www.blogger.com/profile/08941174992598898511noreply@blogger.comtag:blogger.com,1999:blog-36589303.post-67313112695048938632014-10-12T17:56:01.171+02:002014-10-12T17:56:01.171+02:00Este comentario ha sido eliminado por el autor.Apiciohttps://www.blogger.com/profile/08941174992598898511noreply@blogger.comtag:blogger.com,1999:blog-36589303.post-31604333714932776562014-10-11T13:01:07.997+02:002014-10-11T13:01:07.997+02:00Este comentario ha sido eliminado por el autor.Apiciohttps://www.blogger.com/profile/08941174992598898511noreply@blogger.comtag:blogger.com,1999:blog-36589303.post-45906389994319090512014-10-09T12:02:14.298+02:002014-10-09T12:02:14.298+02:00Hi, Apicio
Thanks for reading. Yes, the steps you...Hi, Apicio<br /><br />Thanks for reading. Yes, the steps you outline should be OK. Just check the ARFF files I provide to ensure that the class tags you use are the same, or change them in the SentimentClassifier.java code to fit your needs.<br /><br />Please remember the class targets classification, not evaluation. So you would need to evaluate within WEKA or to try my FilteredClassifier examples.<br /><br />Good luck and regards,<br /><br />JMJose Maria Gomez Hidalgohttps://www.blogger.com/profile/17053588779560658723noreply@blogger.comtag:blogger.com,1999:blog-36589303.post-2472188371941407862014-10-06T21:46:38.937+02:002014-10-06T21:46:38.937+02:00I'd like to thank you for your hard work and f...I'd like to thank you for your hard work and for your Git repo. I'm making a thesis about sentiment analysis on TripAdvisor. I've manually tagged 10.000+ reviews and now I've to do an app using your tut. May you confirm these: <br />1) I need an ARFF file with the text of reviews.<br />2) After that I need to run Sentiment Analysis following your SentimentClassifier.java<br />3) I check the results.<br /><br />Thanks a lot!Apiciohttps://www.blogger.com/profile/08941174992598898511noreply@blogger.comtag:blogger.com,1999:blog-36589303.post-72326670099487297452014-01-28T18:05:32.931+01:002014-01-28T18:05:32.931+01:00Dear Francisco
Thank you very much for reading an...Dear Francisco<br /><br />Thank you very much for reading and for your feedback. <br /><br />Unfortunately, I have not written about the possible algorithm configurations yet. I focus on simple tutorials for doing relatively simple things. When performing a comprenhensive research (like that for your thesis), you may need to test the algorithms in many other configurations, apart from the default ones. For instance, the parameter "C" in SMO (Support Vector Machines) has been demonstrated to strongly affect results, specially in the case of text classification.<br /><br />However you can hardly test all potential configurations, and most times you have to use your knowledge about the algorithms to choose the parameters' values.<br /><br />And it depends on the focus of your research as well. For instance, if you are focusing on the representaion of tweets, the machine learning algorithms is rather a black box; in this case, choosing 2/3 configurations over 5/6 algorithms may be enough. If some algorithm is particularly strong, you can then test several more configurations of this algorithm in order to check which is the best representation across all them.<br /><br />In other words, you need your advisor to check with you which configurations are more approppriate and how many you need to check.<br /><br />Good luck with your experiments!<br /><br /> Jose MariaJose Maria Gomez Hidalgohttps://www.blogger.com/profile/17053588779560658723noreply@blogger.comtag:blogger.com,1999:blog-36589303.post-14707536250523040462014-01-24T19:13:21.266+01:002014-01-24T19:13:21.266+01:00Hello,
First of all, i congratulate you for all th...Hello,<br />First of all, i congratulate you for all this posts you have done.... making DM with WEKA more accessible.<br />I'm making my thesis of Computer Engineering about sentiment analysis of tweets.<br />I'm comparing SMO, J48 and BayesNet . I would like to ask you if it is enough to run the classifiers with the default configuration? because in your posts is how you usually do it. What others configurations can i try?? may be there is some post you made about this.<br />Thanks<br />Francisco Boatomatienzohttps://www.blogger.com/profile/07530818784897637548noreply@blogger.com