FoolFuuka/Asagi
How asagi does stuff
When asagi does a thread update:
in : YotsubaJSON.java, ln. 88:
`public Topic getThread(int threadNum, String lastMod) throws ContentGetException, ContentParseException, CfBicClearParseException {`
Loads thread JSON
Decodes JSON
For each post in the decoded thread JSON:
Check if resto value is zero, and if so create a new thread from that post, updating lastmodified time to the time from fetching the JSON. `t = this.makeThreadFromJson(pj);`
If resto is zero: Add the post to the current thread.
`t.addPost(this.makePostFromJson(pj));` (What if two posts were resto==0? We’d break!)
Fuuka Imageboard Archival Standard (Fuuka)
Note: Not to be confused with the FoolFuuka frontend, which uses the Asagi scraper and therefore its standard instead.
Asagi Imageboard Archival Standard (Asagi)
The Asagi Imageboard Archival Standard was developed by eksopl of Easymodo and the Foolz team under the direction of woxxy. It was developed to run the Foolz archiver, and has been the engine for the majority of archivers since the collapse of Archive.moe.
Three versions can be identified:
- Mark I (2009) - Produced for Foolz.us. Maybe in use by Nyafuu, was in use by Loveisover.
- Mark II (2015) - Produced for Archive.moe. Used by Fireden and arch.b4k.co.
- Mark III (2019) - The final reference standard codified by the Bibliotheca Anonoma, in preparation for the development of new drop-in replacements.
Reference Implementation
- Operating System: Ubuntu 14.04/16.04 LTS
- Database: MySQL Compatible utf8mb4
- Despite the fact that PostgreSQL is supported by Asagi Scraper, it is not supported by FoolFuuka.
- Scraper: Asagi (Java) - https://github.com/bibanon/asagi
- Frontend: FoolFuuka (PHP) - https://github.com/pleebe/FoolFuuka/tree/experimental
- PHP Engine: Historically HHVM, PHP5.x compatible. Desuarchive and 4plebs now uses PHP 7.
- Search: Sphinxsearch
New Implementation
Proposed, still needs to be constructed.
- Database: Percona MariaDB TokuDB - Special variant of MariaDB with many unique optimizations.
- Scraper: baystdev/Hayden (.NET C#)
- Frontend: https://github.com/pleebe/FoolFuuka/tree/experimental
- New Frontend: some python based 4chan API compatible middleware
Specifications
Files
first 4chan timestamp filename seen is recorded to the sql database with md5sum as unique key?
all future images with that md5sum are then linked to that timestamp filename
directory format does the following subfolders based on first few numbers to cut down on amount of files in a single directory (overloads filesystem)
1234/56/123456789000.jpg
Time
eastern time is used due to scraping
MySQL Schema
see asagi source code
foolfuuka also adds tables so dont forget when building
API Schema
Ayase Imageboard Archival Standard (Ayase)
The Ayase Imageboard Archival Standard was produced by the Bibliotheca Anonoma to handle the ever growing operations of Desuarchive and RebeccaBlackTech.
Reference Implementation
- Operating System: CentOS/RHEL 8
- Database: PostgreSQL
- Scraper: Ena or H (.NET C#)
- Middleware: Ayase (Python PyPy)
- Frontends: 4chan X, Clover, iphone app
Specifications
Files
- All files are to be named by shA256sum and file extension. This was chosen for the broad availability of hardware xtensions for the purpose nd its use by 8chan/vichan.
- They are to be stored in double nested folders.
Time
- Ayase requires time to be stored in PostgreSQL datetimes, which also store timezones.
- Only UTC should be used as the timezone for newly scraped data. The timezone support is not an excuse to store in other timezones.
- The timezone support is only meant for compatibility purposes with prior Asagi data, given that they store time as US time (maybe Eastern) due to their past HTML scraping. Future scrapes are strongly advised not to replicate this behavior, local time should be up to the frontend to determine.
PostgreSQL Schema
if we GET json from the 4chan API, and always serve the same json to the user, why deconstruct and reconstruct into post focused sql records every time?
Elasticsearch Engine
A seperate elastic search engine kept in sync with, but independent from the sql server, will replace Sphinxsearch which queries the mysql db