We use cookies to ensure you have the best browsing experience on our website. Please read our cookie policy for more information about how we use cookies.
- Prepare
- Artificial Intelligence
- Statistics and Machine Learning
- Stack Exchange Question Classifier
- Discussions
Stack Exchange Question Classifier
Stack Exchange Question Classifier
Sort by
recency
|
21 Discussions
|
Please Login in order to post a comment
Leveraging tools like Naive Bayes classification, as suggested, is a great starting point, especially for beginners in machine learning. ekbet sign up
A random forest classifier probably works well on this data:
https://github.com/angelgldh/HackerRank/blob/main/Artificial_Intelligence/stack_exchange_question_classifier/text_classifier_quora_topics.ipynb
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import HashingVectorizer if sys.version_info[0]>=3: raw_input=input transformer=HashingVectorizer(stop_words='english')
_train=[] train_label=[] f=open('training.json') for i in range(int(f.readline())): h=json.loads(f.readline()) _train.append(h['question']+"\r\n"+h['excerpt']) train_label.append(h['topic']) f.close() train = transformer.fit_transform(_train) svm=LinearSVC() svm.fit(train,train_label)
_test=[] for i in range(int(raw_input())): h=json.loads(raw_input()) _test.append(h['question']+"\r\n"+h['excerpt']) test = transformer.transform(_test) test_label=svm.predict(test) for e in test_label: print(e)
I'm using a bag of words model in sklearn, and my problem seems to be that transforming the bag of words from a sparse matrix to a dataframe takes to long. But CountVectorizer can only create sparse matrices, so I don't see a way around this, nor do I know of a simpler model for modeling text. Any suggestions?