From 01b6c2baac9e19ed9aa47332b700a51e5510c31e Mon Sep 17 00:00:00 2001 From: Vahagn Khachatryan Date: Fri, 29 Oct 2021 08:14:23 +0100 Subject: [PATCH] Facebook --- Facebook/aad.md | 122 +++++++++++++++++++++++++++ Facebook/aad_how_to.md | 21 +++++ Facebook/aad_notes_for_newjoiners.md | 48 +++++++++++ Facebook/abbreviations.md | 17 ++++ Facebook/arch.md | 14 +++ Facebook/bento_pvc.md | 43 ++++++++++ Facebook/command_line_cheat_sheet.md | 52 +++++++++--- Facebook/deltoid.md | 6 ++ Facebook/hack.md | 3 + Facebook/howto.md | 20 ++++- Facebook/metrics.md | 17 ++++ Facebook/oncall.md | 10 +++ Facebook/qe_exposure_debug.md | 61 ++++++++++++++ Facebook/shortcuts.md | 20 +++++ Facebook/sigma.md | 52 ++++++++++++ Facebook/todo.md | 6 ++ 16 files changed, 496 insertions(+), 16 deletions(-) create mode 100644 Facebook/aad.md create mode 100644 Facebook/aad_how_to.md create mode 100644 Facebook/aad_notes_for_newjoiners.md create mode 100644 Facebook/abbreviations.md create mode 100644 Facebook/arch.md create mode 100644 Facebook/bento_pvc.md create mode 100644 Facebook/deltoid.md create mode 100644 Facebook/hack.md create mode 100644 Facebook/metrics.md create mode 100644 Facebook/oncall.md create mode 100644 Facebook/qe_exposure_debug.md create mode 100644 Facebook/shortcuts.md create mode 100644 Facebook/sigma.md create mode 100644 Facebook/todo.md diff --git a/Facebook/aad.md b/Facebook/aad.md new file mode 100644 index 0000000..7883f42 --- /dev/null +++ b/Facebook/aad.md @@ -0,0 +1,122 @@ +# Abusive Account Detection + +# Helpful bunnylol +||| +-|- +aad | aad wiki pages +go aadata | abusive accounts data +fblearner | model trainings are listed here. +orb | inmemory DB + + +# Data +||| +-|- +ig_signup_sigma_features | +ig_challenge_.... | accounts that were challenged + + +# proxy metrics +||| +-|- +enrolments | tested users (UFAC) +clear | the lower better (UFAC cleared) + +# human labeling +||| +-|- +holdout | signed users, after 12days we label them by humans -> empty, bad, good (bennain) ~5K avccounts + +false negarive + + + +MAU prevalense +MIMA prevalence + +# Folders + +- ## Misc + - `fbcode/dataswarm-pipeline/tasks/si/fake_accounts` + - `fbcode/dper3/dper3_models/si/olf` + - `www/flib/intern/scripts/sigma/clssifiers/olf` + + - `configerator/source/sigma/online_classifiers/runtimes` + - all our classifiers are here + - `configerator/source/si/fake_accounts` + - defines active classifiers and the defaults + + - `si_sigma/Lib/FakeAccounts` + - sigma rules for the fake accounts, namely new_user_registration is processed here + +- ## Models + - `fbcode/fblearner/flow/projects/fluent2/domains/si/aad_surfaces` + - fblearner models + +- ## Sentry + - `configerator/source/si/sentry/prod//.cconf` + - configuration of sentries. + - e.g. `namespcae=facebook, category=new_user_registration` defines what is passed to sigma rules. This is used for FB. IG has different config perhaps. + - `configerator/source/si/sentry/si_namespaces.thrift` + - sentry namespaces + - `www/flib/si/sentry/category/SentryCategory.php` + - existing categories + - `www/flib/si/sentry/preparable/filters/sigma/SigmaFilter.php` + - Sigma filter that can be found in sentry configuarations. + - bunnylol orb + - this can be used to query sentry logs + +- ## Sigma + - `si_sigma/Endpoint/Sentry/SentryFollowProfile.hs` + - `si_sigma/Contexts/Sentry/SentryFollowProfile.hs` + - scuba sigma_profiling + - to profile + +- ## QE + - `configerator/qe2_diff/newExperiments/vahagnk_fast_tiger_clone.txt` + +- ## Reg Attack + - `configerator/source/si/reg_attacks/surface_definitions.cinc` + - This here we define surfaces. + - `configerator/source/si/reg_attacks/attack_definitions/` + - Attack definitions. + - `source/si/reg_attacks/reg_attacks.thrift` + - FieldTypes are here. + +- ## Piplines + - [dataswarm piplines](https://www.internalfb.com/code/fbsource/fbcode/dataswarm-pipelines/tasks/si/fake_accounts/) + - [online_reg](https://www.internalfb.com/code/fbsource/fbcode/dataswarm-pipelines/tasks/si/fake_accounts/online_reg/) + +- ## OLF + - phps OLFAdminV2 status --classifier reg_enthusiastic_impala + - [firefighting](https://www.internalfb.com/intern/wiki/OLF/Firefighting/) + + + +# Model Training workflows + +- Train the model +``` +flow-cli canary si.olf.ig_signup.train@olf --run-as-secure-group=team_abusive_accounts_detection --entitlement si --parameters-file configs/ig_signup_andromeda_offline.json +``` +to monitor progress use [bunnylol fblearner](https://www.internalfb.com/intern/fblearner). + +- Publish model +``` +aimps publish-model --manifold --oncall aad_surfaces --is-dper-model -d service_sharded _ +``` +to monitor progress use [bunnylol predictor](https://www.internalfb.com/intern/predictor). + +- Register model +``` +phps --www-root /var/www OLFAdminV2 training --action=register_model --classifier-name --problem IG_FA_ANDROMEDA --surface IG_SIGNUP --model-id --threshold 0.5 +``` +model name is ig_signup_colorful_animal + +- [Compare how model fires.](https://fburl.com/scuba/ig_signup_sigma_features/svhenbuq) +- Create experiment to compare enrolled vs cleared. [bunnylol qe2](https://www.internalfb.com/intern/experiments) + +# Team identifiers +- fawg - this is abusive account detection group for diffs + +# Tables diff --git a/Facebook/aad_how_to.md b/Facebook/aad_how_to.md new file mode 100644 index 0000000..52f7b0d --- /dev/null +++ b/Facebook/aad_how_to.md @@ -0,0 +1,21 @@ +# How to: + +## Policy + +### Monitor Policy +- Use sigma policy dashboard +``` +bunnylol spd + e.g. bunnylol spd ipSpaceAnomalyRegConfContactpointMismatch +``` +- If you have GK, then you can find charts at the bottom of +``` +bunnylol gk +``` + +### Monitor Context +- Use Sigma Context Dashboard +``` +bunnylol scd + e.g. bunnylolscd sentry_confirm_email +``` diff --git a/Facebook/aad_notes_for_newjoiners.md b/Facebook/aad_notes_for_newjoiners.md new file mode 100644 index 0000000..28033b4 --- /dev/null +++ b/Facebook/aad_notes_for_newjoiners.md @@ -0,0 +1,48 @@ +# New User Registartion +Our goal is to allow user registration while prevent abusive user from being registered. +That is give the limmited amount of inormation we have about the user, we need to classify or predict his intent. + +Ideally we would not even allocate an id for the abusive users. Therefore we could separate our checks into two categories: +- preregistration checks: where user doesn't even have id allocated +- registration checks: user has an id + + +## Preregistation + +## Registring +We have multiple mechanisms which help us to prevent registration of abusive users. Those are: +- ML Classifiers +- Regattacks +- Anomaly detection + + +As far as I understand, once we detect abusive user registration we send them to UFAC. UFAC then would propose a number of challenges and either clear the user, or confirm abusive character of the registration. + +In that regard [ufac_core][ufac_core_table] table is a good source of data. Take a time and check what data it contains. + +Some interesting Scuba queries are: +- [UFAC enrollmets per reason][scuba_ufac_enrollment_reason] +- [UFAC clears per reason][scuba_ufac_clears] - in other words false positives (FP). +- [UFAC enrollments per context][scuba_ufac_context] + +Some interesting columns/values in [ufac_core][ufac_core_table] are: +- Event - enrolled, cleared +- Violation Type - FAKE_ACCOUNT +- + +# Must Read +- [The Life of a Registration model][life_of_reg_model] + - Describes registration model a bit more. +# Code Ownership +One could use [bunnylol ownership manager][bunnylol_ownership_manager] to get the list of assets we own. + + + + +[ufac_core_table]: https://www.internalfb.com/intern/data/info/tables/scuba/uber/ufac_core/ +[scuba_ufac_enrollment_reason]: https://fburl.com/scuba/ufac_core/2eqmiib9 +[scuba_ufac_clears]: https://fburl.com/scuba/ufac_core/17ebiuzs +[scuba_ufac_challenge]: https://fburl.com/scuba/ufac_core/np8svxf0 +[scuba_ufac_context]: https://fburl.com/scuba/ufac_core/e2513thj +[bunnylol_ownership_manager]: https://fburl.com/catalog/uzazxbwb +[life_of_reg_model]: https://fb.workplace.com/notes/1034802330668606 diff --git a/Facebook/abbreviations.md b/Facebook/abbreviations.md new file mode 100644 index 0000000..3483e5e --- /dev/null +++ b/Facebook/abbreviations.md @@ -0,0 +1,17 @@ +||| +-|- +MAU | Monthly Active Users (user==account) +MAP | Monthly Active People (person=all user accounts) +WAU | Weekly Avtive Users +WAP | Weekly Avtive People +DAU | Daily Avtive Users +DAP | Daily Avtive People +XAU | X Avtive Users +XAP | X Avtive People +SI | Site Integrity +NAWI | Non Abusive WAP Impact +FP | False Positive +FN | False Negative +OLF | Online Learning Framework +ORB | +NFX | Negative Feedback eXperience diff --git a/Facebook/arch.md b/Facebook/arch.md new file mode 100644 index 0000000..84697af --- /dev/null +++ b/Facebook/arch.md @@ -0,0 +1,14 @@ + +# Surface (what is a surface)? + +# Sentry +A surface call sentry, which then calls handler to check the action. +[bunnylol wiki sentry](https://www.internalfb.com/intern/wiki/Sentry/) +All sentry check are logged to [bunnylol orb](https://www.internalfb.com/intern/si/orb). +We are siting on [new_user_registration](https://www.internalfb.com/code/configerator/source/si/sentry/prod/facebook/new_user_registration.cconf) sentry category. + +TODO: take a look at [cbgs aad_surfeces](https://www.internalfb.com/code/search?q=filepath%3Asource%2Fsi%2Fsentry%2F%20repo%3Aconfigerator_all%20aad_surfaces) + +[Current production models](https://www.internalfb.com/code/configerator/source/si/fake_accounts/olf_registration_classifiers.cconf) + +Datr (dɑːtər) is a special browser cookie designed to identify browsers and apps. diff --git a/Facebook/bento_pvc.md b/Facebook/bento_pvc.md new file mode 100644 index 0000000..74dbb5a --- /dev/null +++ b/Facebook/bento_pvc.md @@ -0,0 +1,43 @@ +# To Read +[4 PVC tips that will make you more productive](https://fb.workplace.com/notes/4516377471737510/) +[]() + +# Code +### Imports + +`from bento import wait` + +``` +import pandas as pd +import numpy as np +``` + +`from fblearner.flow.api import types` + +### DS + +``` +from datetime import date, timedelta +ds = date.today() - timedelta(days=1) +``` + +or use +``` +from fblearner.flow.projects.pvc.date import ( + day, week, + today, yesterday, + date_range, + latest_ds, +) +# All partitions from two weeks ago to the latest having landed +dataset = Dataset( + namespace="some_namespace", + table="some_table", + partition=dict( + ds=date_range( + today() - 2 * week, + latest_ds("some_namespace", "some_table"), + ), + ), +) +``` diff --git a/Facebook/command_line_cheat_sheet.md b/Facebook/command_line_cheat_sheet.md index 40d087c..4c4cdf8 100644 --- a/Facebook/command_line_cheat_sheet.md +++ b/Facebook/command_line_cheat_sheet.md @@ -1,42 +1,58 @@ +# Working with hack +||| +-|- +hh | build +t | test +t | test + # Mercurial -| +||| -|- hg book | temporary branch hg pull/hg rebase -d master | git pull --rebase hg revert | git co -f hg commit | git ci +hg commit --stack +hg fold --from | merge multiple commits +hg hide | hide not needed commits # Jelly Fish -| +||| -|- jf apply --all --suggested [--dry-run] | apply suggested changes jf apply --accepted --suggested [--dry-run] | apply accepted changes jf submit -e | sumbit and edit a comment jf land | land the change +# Arc +||| +-|- +arc build | userd in configerator +arc canary | +arc canary --cancel | + # Buck -| +||| -|- buck test @mode/dev-nosan local/path/... | run all tests buck build @mode/opt local/path/... | build optimized version (by default @mode/dev) buck build -c python.helpers=true //path:target | interactive shell to play with the modules ... -ipython.par | not sure what this is -# On deman DB -| +# Ondeman +||| -|- +ondemand connect | connect to an ondemand server +| ondemand devdb new | new dev DB ondemand devdb list | list dev DBs ondemand devdb connect --name | connect to dev DB - -# Arc | --|- -arc build | userd in configerator +`tail -f /var/facebook/logs/users/svcscm/error_log_svcscm`
VSC `slog: open` | slog on ondemand server # MySQL -| +||| -|- SHOW DATABASES; | SHOW DATABASES LIKE 'open%'; | @@ -44,7 +60,7 @@ SHOW SCHEMAS; | # Systems -| +||| -|- **Data** | cdm | dataswarm jobs? @@ -54,7 +70,7 @@ SHOW SCHEMAS; | uhaul | data permissining ?? Warm Storage | HDFS presto | in memory - spark | + spark | hive | data store / query lang (use spark) dataswarm | DAG daiquery | simple queries @@ -65,9 +81,19 @@ SHOW SCHEMAS; | # Bunnylol -| +||| -|- **Network** | backbone | network topology **Service** | smc | service discovery service +**Sigma** | +scd | Sigma Context Dashboard +spd | Sigma Policy Dashboard +**Misc** | +slog | some logs? +@od | on demand sandbox something +centra | review id to see if user is abusive or benign + +# Ohno +ohno - \ No newline at end of file diff --git a/Facebook/deltoid.md b/Facebook/deltoid.md new file mode 100644 index 0000000..58cbdae --- /dev/null +++ b/Facebook/deltoid.md @@ -0,0 +1,6 @@ + +community_integrity:ufac:challenge_completed +community_integrity:ufac:disabled + +community_integrity:dec:fb:actioning +community_integrity:dec:fb:harm diff --git a/Facebook/hack.md b/Facebook/hack.md new file mode 100644 index 0000000..28eba74 --- /dev/null +++ b/Facebook/hack.md @@ -0,0 +1,3 @@ +fbdbg --attach --type hhvm +> = $context = new QuarkzObjectContext(tuple(674466959, null)) +> = AADAnyOfACDCSignal::genVal(omni(), $context, vec['UserControlls50Groups']) |> prep($$) diff --git a/Facebook/howto.md b/Facebook/howto.md index 2f48397..56bdaa8 100644 --- a/Facebook/howto.md +++ b/Facebook/howto.md @@ -1,5 +1,19 @@ +# fbpkg + + + + # Tupperware +- create config.tw - tw job validate -- tw sandbox interactive -- tw sandbox start -- tw sandbox resolve <- without config> \ No newline at end of file +- tw sandbox interactive \ +- tw sandbox start \ +- tw sandbox resolve # This will list running containers? +- tw sandbox stop \ +- tw job start \ # Push to prod? +- tw ssh \ # ssh to running container +- tw log --tail --file stderr \ + +[Hands-on Lab](https://www.internalfb.com/intern/wiki/Tupperware-bootcamp-handson-lab/) + +# Deployment to prod diff --git a/Facebook/metrics.md b/Facebook/metrics.md new file mode 100644 index 0000000..d544476 --- /dev/null +++ b/Facebook/metrics.md @@ -0,0 +1,17 @@ +||| +-|- +[WAP@14](wap14) | Weekly Active People registered 14 days ago. See also [here](wap14_2). +[NAWI](nawi) | Non Abusive WAP Impact. See also [here](nawi2). +[FP Reach](nawi2) | aka FPR. Deprecated. +FN@Reg | ? +FN@24 | ? + + + + + + +[wap14]: https://fb.workplace.com/groups/613224465392839/posts/2504163902965543/ +[wap14_2]: https://www.internalfb.com/intern/qa/3679/what-is-wap14 +[nawi]: https://www.internalfb.com/intern/wiki/Fake-accounts-site-integrity/NAWI/ +[nawi2]: https://www.internalfb.com/intern/anp/view/?id=62458 diff --git a/Facebook/oncall.md b/Facebook/oncall.md new file mode 100644 index 0000000..0b3d59f --- /dev/null +++ b/Facebook/oncall.md @@ -0,0 +1,10 @@ + +### How to find current production model +- bunnylol olf reg_enthusiastic_impala + +### Model validation fails +- phps OLFAdminV2 status --classifier +- Take a look at [OLF firefighting](https://www.internalfb.com/intern/wiki/OLF/Firefighting/). +- Take a look at [validator config](https://www.internalfb.com/code/configerator/source/sigma/online_classifiers/validators/reg_enthusiastic_impala.cconf). + - Check validation dataswarm pipeline (if it runs). + - e.g. bunnylol data - lineage - locate pipeline. diff --git a/Facebook/qe_exposure_debug.md b/Facebook/qe_exposure_debug.md new file mode 100644 index 0000000..f253243 --- /dev/null +++ b/Facebook/qe_exposure_debug.md @@ -0,0 +1,61 @@ +z56Y7UTqKZMusqM + +genAugmentFeatureMap() +https://www.internalfb.com/code/www/flib/si/sentry/one_way/SentryOneWay.php?lines=82 + + +EntPersonAccountCreationCriticalObserver::genExecutePostActions +https://www.internalfb.com/code/www/flib/entity/person/observers/EntPersonAccountCreationCriticalObserver.php?lines=180 + +AccountCreationRoadblocker::genMaybeRoadblockUser( +https://www.internalfb.com/code/www/flib/account/creation/AccountCreationRoadblocker.php?lines=74 + + +RegistrationSentryBuilder::genAssertAllowed +AbstractSentryBuilder::genAssertAllowed +SentryClient::genFBRun *** +SentryFB::gen_DONT_CALL_IT_DIRECTLY +SentryFB::genCheck * ** + +* +SentryFB::genCheck +Sentry::genAugmentFeatureMap +SentryWithExperiments::genAugmentFeatureMapExperiments +SentryWithExperiments::genAugmentFeatureMapUserExperiments +SentryWithExperiments::genQEExperiments +SentryWithExperiments::genQEParams + +** +SentryFB::postProcessRestrictions +logQEExposures +logSingleQEExposure + +*** +AbstractSentryBuilder::genAssertAllowed +SentryFB::assertAllowed +SentryFB::assertAllowedImpl +SentryFB::getFinalRestriction +SentryFB::getResponse + +-- +haskel + +registrationOnlineModel +getParameterDefaultForSource +getParameterDefaultRequest +getParameterDefault +getParameter +getParameters - withParameters +getExperiment +https://www.internalfb.com/code/si_sigma/Lib/Experiments/QE/QEAPI.hs +genQE +getRequestedQEParameters +getBoolMaybe +logAutoExposure +https://www.internalfb.com/code/www/flib/site/x/sigma/XSigmaWWWExperimentationServiceController.php + + + + + +https://www.internalfb.com/diff/D30615875 diff --git a/Facebook/shortcuts.md b/Facebook/shortcuts.md new file mode 100644 index 0000000..ff0106a --- /dev/null +++ b/Facebook/shortcuts.md @@ -0,0 +1,20 @@ +# DackDackGo +|| | || +-|-|-|- +/ | jump to search | | +Open results: + + Enter or l or o — go to the highlighted result, or use it right away to go to the first result + Ctrl/Cmd+Enter — open a result in the background + d — domain search (if a result is highlighted) + ' or v — open the highlighted result in a new window/tab. Since this uses JavaScript, you need to turn off pop-up blockers first. + +Move around: + + ← and → — navigate Instant Answer tabs. When an Instant Answer is open, navigate within the Instant Answer. + ↓ or j — next search result + ↑ or k — previous search result + / or h — go to search box + s — go to misspelling link (if any) + t — go to top + m — go to main results diff --git a/Facebook/sigma.md b/Facebook/sigma.md new file mode 100644 index 0000000..29c9759 --- /dev/null +++ b/Facebook/sigma.md @@ -0,0 +1,52 @@ +### How to input samples for the context +``` +inputs <- H.sampleInputsFrom "sentry_confirm_email" 500 +``` + +### How to test inputs +``` +map failOnError <$> + T.batchTestHaxl ( + textShow . Map.lookup (QE.QEUniverse "registration_classifiers") <$> + QE.enabledExperiments + ) + inputs +``` + +### How to find users with enabled GK +``` +gk <-map failOnError <$> T.batchTestHaxl (if passesGKPreCalculated "reg_conf_contactpoint_mismatch" then fmapId else return 0) inputs +good_inputs = [ snd x | x <- zip gk inputs, fst x /= 0 ] +``` + +``` +gk <-map failOnError <$> T.batchTestHaxl (if passesGKPreCalculated "reg_conf_contactpoint_mismatch" then fmapId else return "") inputs +``` + + + +1. The lazy one using exceptions. +``` +inputs <- H.sampleInputsFrom "some_context" 1000 +responses <- rights <$> T.batchTestPolicy (if passesGKExperiment "some_gk" then somePolicy else throw LogicError "An error") inputs +``` + +2. The fancier one with conditions, that gives you more flexibility. +``` +inputs <- H.sampleInputsFrom "some_context" 1000 +passes <- map failOnError <$> T.batchTestHaxl (GK.passesGKExperiment "some_gk") inputs +``` + +``` +inputs_gk = map (inputs !!) $ elemIndices True passes +responses <- T.batchTestPolicy somePolicy inputs_gk +``` + +``` +gk <-map failOnError <$> T.batchTestHaxl (passesGKPreCalculated "reg_conf_contactpoint_mismatch") inputs +``` + +### How to dump Input Map content +``` +T.pasteString "Some input map" . textShow =<< pp <$> inputMap +``` diff --git a/Facebook/todo.md b/Facebook/todo.md new file mode 100644 index 0000000..a033899 --- /dev/null +++ b/Facebook/todo.md @@ -0,0 +1,6 @@ +||| +-|- +OLF Wiki +DEC Wiki +Sigma ? +Sentry ?