This commit is contained in:
2021-10-29 08:14:23 +01:00
parent c0879efa7f
commit 01b6c2baac
16 changed files with 496 additions and 16 deletions

122
Facebook/aad.md Normal file
View File

@@ -0,0 +1,122 @@
# Abusive Account Detection
# Helpful bunnylol
|||
-|-
aad | aad wiki pages
go aadata | abusive accounts data
fblearner | model trainings are listed here.
orb | inmemory DB
# Data
|||
-|-
ig_signup_sigma_features |
ig_challenge_.... | accounts that were challenged
# proxy metrics
|||
-|-
enrolments | tested users (UFAC)
clear | the lower better (UFAC cleared)
# human labeling
|||
-|-
holdout | signed users, after 12days we label them by humans -> empty, bad, good (bennain) ~5K avccounts
false negarive
MAU prevalense
MIMA prevalence
# Folders
- ## Misc
- `fbcode/dataswarm-pipeline/tasks/si/fake_accounts`
- `fbcode/dper3/dper3_models/si/olf`
- `www/flib/intern/scripts/sigma/clssifiers/olf`
- `configerator/source/sigma/online_classifiers/runtimes`
- all our classifiers are here
- `configerator/source/si/fake_accounts`
- defines active classifiers and the defaults
- `si_sigma/Lib/FakeAccounts`
- sigma rules for the fake accounts, namely new_user_registration is processed here
- ## Models
- `fbcode/fblearner/flow/projects/fluent2/domains/si/aad_surfaces`
- fblearner models
- ## Sentry
- `configerator/source/si/sentry/prod/<namespace>/<category>.cconf`
- configuration of sentries.
- e.g. `namespcae=facebook, category=new_user_registration` defines what is passed to sigma rules. This is used for FB. IG has different config perhaps.
- `configerator/source/si/sentry/si_namespaces.thrift`
- sentry namespaces
- `www/flib/si/sentry/category/SentryCategory.php`
- existing categories
- `www/flib/si/sentry/preparable/filters/sigma/SigmaFilter.php`
- Sigma filter that can be found in sentry configuarations.
- bunnylol orb
- this can be used to query sentry logs
- ## Sigma
- `si_sigma/Endpoint/Sentry/SentryFollowProfile.hs`
- `si_sigma/Contexts/Sentry/SentryFollowProfile.hs`
- scuba sigma_profiling
- to profile
- ## QE
- `configerator/qe2_diff/newExperiments/vahagnk_fast_tiger_clone.txt`
- ## Reg Attack
- `configerator/source/si/reg_attacks/surface_definitions.cinc`
- This here we define surfaces.
- `configerator/source/si/reg_attacks/attack_definitions/`
- Attack definitions.
- `source/si/reg_attacks/reg_attacks.thrift`
- FieldTypes are here.
- ## Piplines
- [dataswarm piplines](https://www.internalfb.com/code/fbsource/fbcode/dataswarm-pipelines/tasks/si/fake_accounts/)
- [online_reg](https://www.internalfb.com/code/fbsource/fbcode/dataswarm-pipelines/tasks/si/fake_accounts/online_reg/)
- ## OLF
- phps OLFAdminV2 status --classifier reg_enthusiastic_impala
- [firefighting](https://www.internalfb.com/intern/wiki/OLF/Firefighting/)
# Model Training workflows
- Train the model
```
flow-cli canary si.olf.ig_signup.train@olf --run-as-secure-group=team_abusive_accounts_detection --entitlement si --parameters-file configs/ig_signup_andromeda_offline.json
```
to monitor progress use [bunnylol fblearner](https://www.internalfb.com/intern/fblearner).
- Publish model
```
aimps publish-model --manifold --oncall aad_surfaces --is-dper-model -d service_sharded <model_id>_<snapshot_id>
```
to monitor progress use [bunnylol predictor](https://www.internalfb.com/intern/predictor).
- Register model
```
phps --www-root /var/www OLFAdminV2 training --action=register_model --classifier-name <model_name> --problem IG_FA_ANDROMEDA --surface IG_SIGNUP --model-id <model_id> --threshold 0.5
```
model name is ig_signup_colorful_animal
- [Compare how model fires.](https://fburl.com/scuba/ig_signup_sigma_features/svhenbuq)
- Create experiment to compare enrolled vs cleared. [bunnylol qe2](https://www.internalfb.com/intern/experiments)
# Team identifiers
- fawg - this is abusive account detection group for diffs
# Tables

21
Facebook/aad_how_to.md Normal file
View File

@@ -0,0 +1,21 @@
# How to:
## Policy
### Monitor Policy
- Use sigma policy dashboard
```
bunnylol spd <policy>
e.g. bunnylol spd ipSpaceAnomalyRegConfContactpointMismatch
```
- If you have GK, then you can find charts at the bottom of
```
bunnylol gk <gk_name>
```
### Monitor Context
- Use Sigma Context Dashboard
```
bunnylol scd <context>
e.g. bunnylolscd sentry_confirm_email
```

View File

@@ -0,0 +1,48 @@
# New User Registartion
Our goal is to allow user registration while prevent abusive user from being registered.
That is give the limmited amount of inormation we have about the user, we need to classify or predict his intent.
Ideally we would not even allocate an id for the abusive users. Therefore we could separate our checks into two categories:
- preregistration checks: where user doesn't even have id allocated
- registration checks: user has an id
## Preregistation
## Registring
We have multiple mechanisms which help us to prevent registration of abusive users. Those are:
- ML Classifiers
- Regattacks
- Anomaly detection
<!-- TODO: add more -->
As far as I understand, once we detect abusive user registration we send them to UFAC. UFAC then would propose a number of challenges and either clear the user, or confirm abusive character of the registration.
In that regard [ufac_core][ufac_core_table] table is a good source of data. Take a time and check what data it contains.
Some interesting Scuba queries are:
- [UFAC enrollmets per reason][scuba_ufac_enrollment_reason]
- [UFAC clears per reason][scuba_ufac_clears] - in other words false positives (FP).
- [UFAC enrollments per context][scuba_ufac_context]
Some interesting columns/values in [ufac_core][ufac_core_table] are:
- Event - enrolled, cleared
- Violation Type - FAKE_ACCOUNT
-
# Must Read
- [The Life of a Registration model][life_of_reg_model]
- Describes registration model a bit more.
# Code Ownership
One could use [bunnylol ownership manager][bunnylol_ownership_manager] to get the list of assets we own.
<!-- ## References -->
[ufac_core_table]: https://www.internalfb.com/intern/data/info/tables/scuba/uber/ufac_core/
[scuba_ufac_enrollment_reason]: https://fburl.com/scuba/ufac_core/2eqmiib9
[scuba_ufac_clears]: https://fburl.com/scuba/ufac_core/17ebiuzs
[scuba_ufac_challenge]: https://fburl.com/scuba/ufac_core/np8svxf0
[scuba_ufac_context]: https://fburl.com/scuba/ufac_core/e2513thj
[bunnylol_ownership_manager]: https://fburl.com/catalog/uzazxbwb
[life_of_reg_model]: https://fb.workplace.com/notes/1034802330668606

17
Facebook/abbreviations.md Normal file
View File

@@ -0,0 +1,17 @@
|||
-|-
MAU | Monthly Active Users (user==account)
MAP | Monthly Active People (person=all user accounts)
WAU | Weekly Avtive Users
WAP | Weekly Avtive People
DAU | Daily Avtive Users
DAP | Daily Avtive People
XAU | X Avtive Users
XAP | X Avtive People
SI | Site Integrity
NAWI | Non Abusive WAP Impact
FP | False Positive
FN | False Negative
OLF | Online Learning Framework
ORB |
NFX | Negative Feedback eXperience

14
Facebook/arch.md Normal file
View File

@@ -0,0 +1,14 @@
# Surface (what is a surface)?
# Sentry
A surface call sentry, which then calls handler to check the action.
[bunnylol wiki sentry](https://www.internalfb.com/intern/wiki/Sentry/)
All sentry check are logged to [bunnylol orb](https://www.internalfb.com/intern/si/orb).
We are siting on [new_user_registration](https://www.internalfb.com/code/configerator/source/si/sentry/prod/facebook/new_user_registration.cconf) sentry category.
TODO: take a look at [cbgs aad_surfeces](https://www.internalfb.com/code/search?q=filepath%3Asource%2Fsi%2Fsentry%2F%20repo%3Aconfigerator_all%20aad_surfaces)
[Current production models](https://www.internalfb.com/code/configerator/source/si/fake_accounts/olf_registration_classifiers.cconf)
Datr (dɑːtər) is a special browser cookie designed to identify browsers and apps.

43
Facebook/bento_pvc.md Normal file
View File

@@ -0,0 +1,43 @@
# To Read
[4 PVC tips that will make you more productive](https://fb.workplace.com/notes/4516377471737510/)
[]()
# Code
### Imports
`from bento import wait`
```
import pandas as pd
import numpy as np
```
`from fblearner.flow.api import types`
### DS
```
from datetime import date, timedelta
ds = date.today() - timedelta(days=1)
```
or use
```
from fblearner.flow.projects.pvc.date import (
day, week,
today, yesterday,
date_range,
latest_ds,
)
# All partitions from two weeks ago to the latest having landed
dataset = Dataset(
namespace="some_namespace",
table="some_table",
partition=dict(
ds=date_range(
today() - 2 * week,
latest_ds("some_namespace", "some_table"),
),
),
)
```

View File

@@ -1,42 +1,58 @@
# Working with hack
|||
-|-
hh | build
t | test
t <name of the test class> | test
# Mercurial # Mercurial
| |||
-|- -|-
hg book <bookmark name> | temporary branch hg book <bookmark name> | temporary branch
hg pull/hg rebase -d master | git pull --rebase hg pull/hg rebase -d master | git pull --rebase
hg revert <file> | git co -f <file> hg revert <file> | git co -f <file>
hg commit | git ci hg commit | git ci
hg commit --stack
hg fold --from | merge multiple commits
hg hide | hide not needed commits
# Jelly Fish # Jelly Fish
| |||
-|- -|-
jf apply --all --suggested [--dry-run] | apply suggested changes jf apply --all --suggested [--dry-run] | apply suggested changes
jf apply --accepted --suggested [--dry-run] | apply accepted changes jf apply --accepted --suggested [--dry-run] | apply accepted changes
jf submit -e | sumbit and edit a comment jf submit -e | sumbit and edit a comment
jf land | land the change jf land | land the change
# Arc
|||
-|-
arc build | userd in configerator
arc canary |
arc canary --cancel |
# Buck # Buck
| |||
-|- -|-
buck test @mode/dev-nosan local/path/... | run all tests buck test @mode/dev-nosan local/path/... | run all tests
buck build @mode/opt local/path/... | build optimized version (by default @mode/dev) buck build @mode/opt local/path/... | build optimized version (by default @mode/dev)
buck build -c python.helpers=true //path:target | interactive shell to play with the modules buck build -c python.helpers=true //path:target | interactive shell to play with the modules
... -ipython.par | not sure what this is ... -ipython.par | not sure what this is
# On deman DB # Ondeman
| |||
-|- -|-
ondemand connect | connect to an ondemand server
|
ondemand devdb new | new dev DB ondemand devdb new | new dev DB
ondemand devdb list | list dev DBs ondemand devdb list | list dev DBs
ondemand devdb connect --name <name of the ephemeral xdb shard> | connect to dev DB ondemand devdb connect --name <name of the ephemeral xdb shard> | connect to dev DB
# Arc
| |
-|- `tail -f /var/facebook/logs/users/svcscm/error_log_svcscm`<br>VSC `slog: open` | slog on ondemand server
arc build | userd in configerator
# MySQL # MySQL
| |||
-|- -|-
SHOW DATABASES; | SHOW DATABASES; |
SHOW DATABASES LIKE 'open%'; | SHOW DATABASES LIKE 'open%'; |
@@ -44,7 +60,7 @@ SHOW SCHEMAS; |
# Systems # Systems
| |||
-|- -|-
**Data** | **Data** |
cdm | dataswarm jobs? cdm | dataswarm jobs?
@@ -65,9 +81,19 @@ SHOW SCHEMAS; |
# Bunnylol # Bunnylol
| |||
-|- -|-
**Network** | **Network** |
backbone | network topology backbone | network topology
**Service** | **Service** |
smc | service discovery service smc | service discovery service
**Sigma** |
scd | Sigma Context Dashboard
spd | Sigma Policy Dashboard
**Misc** |
slog | some logs?
@od | on demand sandbox something
centra <id> | review id to see if user is abusive or benign
# Ohno
ohno -

6
Facebook/deltoid.md Normal file
View File

@@ -0,0 +1,6 @@
community_integrity:ufac:challenge_completed
community_integrity:ufac:disabled
community_integrity:dec:fb:actioning
community_integrity:dec:fb:harm

3
Facebook/hack.md Normal file
View File

@@ -0,0 +1,3 @@
fbdbg --attach --type hhvm
> = $context = new QuarkzObjectContext(tuple(674466959, null))
> = AADAnyOfACDCSignal::genVal(omni(), $context, vec['UserControlls50Groups']) |> prep($$)

View File

@@ -1,5 +1,19 @@
# fbpkg
# Tupperware # Tupperware
- create config.tw
- tw job validate - tw job validate
- tw sandbox interactive <config.tw> - tw sandbox interactive \<config.tw>
- tw sandbox start <config.tw> - tw sandbox start \<config.tw>
- tw sandbox resolve <- without config> - tw sandbox resolve # This will list running containers?
- tw sandbox stop \<task handle>
- tw job start \<config.tw> # Push to prod?
- tw ssh \<task handle> # ssh to running container
- tw log --tail --file stderr \<task handler>
[Hands-on Lab](https://www.internalfb.com/intern/wiki/Tupperware-bootcamp-handson-lab/)
# Deployment to prod

17
Facebook/metrics.md Normal file
View File

@@ -0,0 +1,17 @@
|||
-|-
[WAP@14](wap14) | Weekly Active People registered 14 days ago. See also [here](wap14_2).
[NAWI](nawi) | Non Abusive WAP Impact. See also [here](nawi2).
[FP Reach](nawi2) | aka FPR. Deprecated.
FN@Reg | ?
FN@24 | ?
[wap14]: https://fb.workplace.com/groups/613224465392839/posts/2504163902965543/
[wap14_2]: https://www.internalfb.com/intern/qa/3679/what-is-wap14
[nawi]: https://www.internalfb.com/intern/wiki/Fake-accounts-site-integrity/NAWI/
[nawi2]: https://www.internalfb.com/intern/anp/view/?id=62458

10
Facebook/oncall.md Normal file
View File

@@ -0,0 +1,10 @@
### How to find current production model
- bunnylol olf reg_enthusiastic_impala
### Model validation fails
- phps OLFAdminV2 status --classifier <string>
- Take a look at [OLF firefighting](https://www.internalfb.com/intern/wiki/OLF/Firefighting/).
- Take a look at [validator config](https://www.internalfb.com/code/configerator/source/sigma/online_classifiers/validators/reg_enthusiastic_impala.cconf).
- Check validation dataswarm pipeline (if it runs).
- e.g. bunnylol data <table> - lineage - locate pipeline.

View File

@@ -0,0 +1,61 @@
z56Y7UTqKZMusqM
genAugmentFeatureMap()
https://www.internalfb.com/code/www/flib/si/sentry/one_way/SentryOneWay.php?lines=82
EntPersonAccountCreationCriticalObserver::genExecutePostActions
https://www.internalfb.com/code/www/flib/entity/person/observers/EntPersonAccountCreationCriticalObserver.php?lines=180
AccountCreationRoadblocker::genMaybeRoadblockUser(
https://www.internalfb.com/code/www/flib/account/creation/AccountCreationRoadblocker.php?lines=74
RegistrationSentryBuilder::genAssertAllowed
AbstractSentryBuilder::genAssertAllowed
SentryClient::genFBRun ***
SentryFB::gen_DONT_CALL_IT_DIRECTLY
SentryFB::genCheck * **
*
SentryFB::genCheck
Sentry::genAugmentFeatureMap
SentryWithExperiments::genAugmentFeatureMapExperiments
SentryWithExperiments::genAugmentFeatureMapUserExperiments
SentryWithExperiments::genQEExperiments
SentryWithExperiments::genQEParams
**
SentryFB::postProcessRestrictions
logQEExposures
logSingleQEExposure
***
AbstractSentryBuilder::genAssertAllowed
SentryFB::assertAllowed
SentryFB::assertAllowedImpl
SentryFB::getFinalRestriction
SentryFB::getResponse
--
haskel
registrationOnlineModel
getParameterDefaultForSource
getParameterDefaultRequest
getParameterDefault
getParameter
getParameters - withParameters
getExperiment
https://www.internalfb.com/code/si_sigma/Lib/Experiments/QE/QEAPI.hs
genQE
getRequestedQEParameters
getBoolMaybe
logAutoExposure
https://www.internalfb.com/code/www/flib/site/x/sigma/XSigmaWWWExperimentationServiceController.php
https://www.internalfb.com/diff/D30615875

20
Facebook/shortcuts.md Normal file
View File

@@ -0,0 +1,20 @@
# DackDackGo
|| | ||
-|-|-|-
/ | jump to search | |
Open results:
Enter or l or o — go to the highlighted result, or use it right away to go to the first result
Ctrl/Cmd+Enter — open a result in the background
d — domain search (if a result is highlighted)
' or v — open the highlighted result in a new window/tab. Since this uses JavaScript, you need to turn off pop-up blockers first.
Move around:
← and → — navigate Instant Answer tabs. When an Instant Answer is open, navigate within the Instant Answer.
↓ or j — next search result
↑ or k — previous search result
/ or h — go to search box
s — go to misspelling link (if any)
t — go to top
m — go to main results

52
Facebook/sigma.md Normal file
View File

@@ -0,0 +1,52 @@
### How to input samples for the context
```
inputs <- H.sampleInputsFrom "sentry_confirm_email" 500
```
### How to test inputs
```
map failOnError <$>
T.batchTestHaxl (
textShow . Map.lookup (QE.QEUniverse "registration_classifiers") <$>
QE.enabledExperiments
)
inputs
```
### How to find users with enabled GK
```
gk <-map failOnError <$> T.batchTestHaxl (if passesGKPreCalculated "reg_conf_contactpoint_mismatch" then fmapId else return 0) inputs
good_inputs = [ snd x | x <- zip gk inputs, fst x /= 0 ]
```
```
gk <-map failOnError <$> T.batchTestHaxl (if passesGKPreCalculated "reg_conf_contactpoint_mismatch" then fmapId else return "") inputs
```
1. The lazy one using exceptions.
```
inputs <- H.sampleInputsFrom "some_context" 1000
responses <- rights <$> T.batchTestPolicy (if passesGKExperiment "some_gk" then somePolicy else throw LogicError "An error") inputs
```
2. The fancier one with conditions, that gives you more flexibility.
```
inputs <- H.sampleInputsFrom "some_context" 1000
passes <- map failOnError <$> T.batchTestHaxl (GK.passesGKExperiment "some_gk") inputs
```
```
inputs_gk = map (inputs !!) $ elemIndices True passes
responses <- T.batchTestPolicy somePolicy inputs_gk
```
```
gk <-map failOnError <$> T.batchTestHaxl (passesGKPreCalculated "reg_conf_contactpoint_mismatch") inputs
```
### How to dump Input Map content
```
T.pasteString "Some input map" . textShow =<< pp <$> inputMap
```

6
Facebook/todo.md Normal file
View File

@@ -0,0 +1,6 @@
|||
-|-
OLF Wiki
DEC Wiki
Sigma ?
Sentry ?