Anomalies or outliers come in three types.

  1. Point Anomalies. If an individual data instance can be considered as anomalous with respect to the rest of the data (e.g. purchase with large transaction value)

  2. Contextual Anomalies, If a data instance is anomalous in a specific context, but not otherwise ( anomaly if occur at a certain time or a certain region. e.g. large spike at the middle of the night)

  3. Collective Anomalies. If a collection of related data instances is anomalous with respect to the entire dataset, but not individual values. They have two variations.

    1. Events in unexpected order ( ordered. e.g. breaking rhythm in ECG)
    2. Unexpected value combinations ( unordered. e.g. buying a large number of expensive items)

    Problem Statement:

    The Credit Card Fraud Detection Problem includes modeling past credit card transactions with the knowledge of the ones that turned out to be fraud. This model is then used to identify whether a new transaction is fraudulent or not. Our aim here is to detect 100% of the fraudulent transactions while minimizing the incorrect fraud classifications.

    DataSet :

    The dataset that is used for credit card fraud detection is derived from the following Kaggle URL :

    https://www.kaggle.com/mlg-ulb/creditcardfraud

    Observations

    n_outliers = len(Fraud)
    for i, (clf_name,clf) in enumerate(classifiers.items()):
        #Fit the data and tag outliers
        if clf_name == "Local Outlier Factor":
            y_pred = clf.fit_predict(X)
            scores_prediction = clf.negative_outlier_factor_
        elif clf_name == "Support Vector Machine":
            clf.fit(X)
            y_pred = clf.predict(X)
        else:    
            clf.fit(X)
            scores_prediction = clf.decision_function(X)
            y_pred = clf.predict(X)
        #Reshape the prediction values to 0 for Valid transactions , 1 for Fraud transactions
        y_pred[y_pred == 1] = 0
        y_pred[y_pred == -1] = 1
        n_errors = (y_pred != Y).sum()
        # Run Classification Metrics
        print("{}: {}".format(clf_name,n_errors))
        print("Accuracy Score :")
        print(accuracy_score(Y,y_pred))
        print("Classification Report :")
        print(classification_report(Y,y_pred))
    

    aayushkumarjvs/Gaussian_Distribution_Anomaly_Detection