Dato.com
Edit this on Github
Graph Lab Create User Guide
Introduction
1.
Getting started
2.
Working with data
2.1.
Tabular data
2.1.1.
Loading and Saving
2.1.2.
Data Manipulation
2.1.3.
Spark RDDs
2.1.4.
SQL Databases
2.2.
Graph data
2.3.
Time series data
2.4.
Visualization
2.5.
Feature Engineering
2.5.1.
Numeric Features
2.5.1.1.
Quadratic Features
2.5.1.2.
Feature Binning
2.5.1.3.
Numeric Imputer
2.5.2.
Categorical Features
2.5.2.1.
One Hot Encoder
2.5.2.2.
Count Thresholder
2.5.2.3.
Categorical Imputer
2.5.2.4.
Count Featurizer
2.5.3.
Text Features
2.5.3.1.
TF-IDF
2.5.3.2.
Tokenizer
2.5.3.3.
RareWordTrimmer
2.5.3.4.
BM25
2.5.4.
Image Features
2.5.4.1.
Deep Feature Extractor
2.5.5.
Other Transformations
2.5.5.1.
Hasher
2.5.5.2.
Random Projection
2.5.5.3.
Transformer Chain
2.5.5.4.
Custom Transformer
3.
Modeling data
3.1.
Classification
3.1.1.
Logistic Regression
3.1.2.
Nearest Neighbor Classifier
3.1.3.
SVM
3.1.4.
Decision Tree Classifier
3.1.5.
Random Forest Classifier
3.1.6.
Boosted Trees Classifier
3.1.7.
Neuralnet Classifier
3.2.
Regression
3.2.1.
Linear Regression
3.2.2.
Decision Tree Regression
3.2.3.
Boosted Trees Regression
3.2.4.
Random Forest Regression
3.3.
Graph analytics
3.3.1.
Examples
3.4.
Clustering
3.4.1.
KMeans
3.4.2.
DBSCAN
3.5.
Nearest Neighbors
3.6.
Text analysis
3.6.1.
Processing text
3.6.2.
Topic models
3.7.
Evaluating Models
3.7.1.
Regression Metrics
3.7.2.
Classification Metrics
3.8.
Model parameter search
3.8.1.
Models
3.8.2.
Choosing a search space
3.8.3.
Evaluation functions
3.8.4.
Distributed execution
4.
Applications
4.1.
Recommender systems
4.1.1.
Choosing a model
4.1.2.
Making recommendations
4.1.3.
Finding similar items
4.2.
Data matching
4.2.1.
Record Linker
4.2.2.
Deduplication
4.2.3.
Autotagger
4.2.4.
Similarity Search
4.3.
Churn prediction
4.4.
Frequent Pattern Mining
4.5.
Sentiment analysis
4.5.1.
Applying a sentiment classifier
4.5.2.
Product sentiment analysis and review data
4.6.
Anomaly Detection
4.6.1.
Local Outlier Factor
4.6.2.
Moving Z-Score
4.6.3.
Bayesian Changepoints
5.
Dato Distributed
5.1.
Asynchronous Jobs
5.2.
Installing on Hadoop
5.3.
Clusters
5.4.
End-to-End Example
5.5.
Distributed Job Execution
5.6.
Distributed Machine Learning
5.7.
Monitoring Jobs
5.8.
Session Management
5.9.
Dependencies
6.
Predictive Services
6.1.
Getting Started
6.2.
Launching
6.3.
Querying
6.4.
Predictive Objects
6.5.
Logging and Feedback
6.6.
Dependencies
6.7.
Experimentation
6.8.
Operations
6.8.1.
Monitoring and Metrics
6.8.2.
Administration
6.9.
Best Practices
6.10.
Run On-Premises
7.
Conclusion
8.
Exercises
8.1.
Tabular data
8.2.
Graph data
8.3.
Graph analytics
8.4.
Classification
8.5.
Text analysis
8.6.
Recommender systems
9.
FAQ/Common Problems
10.
Contributing
Powered by
GitBook
Graph Lab Create User Guide
Text features
These feature transformations are useful when you have text data.
TF-IDF
Tokenizer
BM25