FoolFuuka/Asagi: Difference between revisions
Antonizoon (talk | contribs) No edit summary Tags: mobile edit mobile web edit |
Antonizoon (talk | contribs) No edit summary |
||
Line 1: | Line 1: | ||
How asagi does stuff | == How asagi does stuff == | ||
When asagi does a thread update: | When asagi does a thread update: | ||
Line 21: | Line 21: | ||
(What if two posts were resto==0? We’d break!) | (What if two posts were resto==0? We’d break!) | ||
Fuuka Imageboard Archival Standard (Fuuka) | = Fuuka Imageboard Archival Standard (Fuuka) = | ||
> Note: Not to be confused with the FoolFuuka frontend, which uses the Asagi scraper and therefore its standard instead. | <blockquote>Note: Not to be confused with the FoolFuuka frontend, which uses the Asagi scraper and therefore its standard instead. | ||
</blockquote> | |||
Asagi Imageboard Archival Standard (Asagi) | = Asagi Imageboard Archival Standard (Asagi) = | ||
The Asagi Imageboard Archival Standard was developed by eksopl of Easymodo and the Foolz team under the direction of woxxy. It was developed to run the Foolz archiver, and has been the engine for the majority of archivers since the collapse of Archive.moe. | The Asagi Imageboard Archival Standard was developed by eksopl of Easymodo and the Foolz team under the direction of woxxy. It was developed to run the Foolz archiver, and has been the engine for the majority of archivers since the collapse of Archive.moe. | ||
Line 37: | Line 35: | ||
* Mark III (2019) - The final reference standard codified by the Bibliotheca Anonoma, in preparation for the development of new drop-in replacements. | * Mark III (2019) - The final reference standard codified by the Bibliotheca Anonoma, in preparation for the development of new drop-in replacements. | ||
== Reference Implementation == | |||
* Operating System: Ubuntu 14.04/16.04 LTS | * Operating System: Ubuntu 14.04/16.04 LTS | ||
* Database: MySQL Compatible utf8mb4 | * Database: MySQL Compatible utf8mb4 | ||
** Despite the fact that PostgreSQL is supported by Asagi Scraper, it is not supported by FoolFuuka. | |||
* Scraper: Asagi (Java) | * Scraper: Asagi (Java) | ||
* Frontend: FoolFuuka (PHP) | * Frontend: FoolFuuka (PHP) | ||
Line 47: | Line 45: | ||
* Search: Sphinxsearch | * Search: Sphinxsearch | ||
== New Implementation == | |||
* Database: Percona MariaDB TokuDB - Special variant of MariaDB with many unique optimizations. | * Database: Percona MariaDB TokuDB - Special variant of MariaDB with many unique optimizations. | ||
Line 54: | Line 52: | ||
* New Frontend: (ASP.NET C#) | * New Frontend: (ASP.NET C#) | ||
== Specifications == | |||
=== Files === | |||
first 4chan timestamp filename seen is recorded to the sql database with md5sum as unique key? | first 4chan timestamp filename seen is recorded to the sql database with md5sum as unique key? | ||
Line 66: | Line 64: | ||
1234/56/123456789000.jpg | 1234/56/123456789000.jpg | ||
=== Time === | |||
eastern time is used due to scraping | eastern time is used due to scraping | ||
=== MySQL Schema === | |||
see asagi source code | see asagi source code | ||
Line 76: | Line 74: | ||
foolfuuka also adds tables so dont forget when building | foolfuuka also adds tables so dont forget when building | ||
=== API Schema === | |||
Ayase Imageboard Archival Standard (Ayase) | = Ayase Imageboard Archival Standard (Ayase) = | ||
The Ayase Imageboard Archival Standard was produced by the Bibliotheca Anonoma to handle the ever growing operations of Desuarchive and RebeccaBlackTech. | The Ayase Imageboard Archival Standard was produced by the Bibliotheca Anonoma to handle the ever growing operations of Desuarchive and RebeccaBlackTech. | ||
== Reference Implementation == | |||
* Operating System: CentOS/RHEL 8 | * Operating System: CentOS/RHEL 8 | ||
Line 91: | Line 88: | ||
* Frontends: 4chan X, Clover, iphone app | * Frontends: 4chan X, Clover, iphone app | ||
== Specifications == | |||
=== Files === | |||
* All files are to be named by shA256sum and file extension. This was chosen for the broad availability of hardware xtensions for the purpose nd its use by 8chan/vichan. | * All files are to be named by shA256sum and file extension. This was chosen for the broad availability of hardware xtensions for the purpose nd its use by 8chan/vichan. | ||
* They are to be stored in double nested folders. | * They are to be stored in double nested folders. | ||
=== Time === | |||
* Ayase requires time to be stored in PostgreSQL datetimes, which also store timezones. | * Ayase requires time to be stored in PostgreSQL datetimes, which also store timezones. | ||
* Only UTC should be used as the timezone for newly scraped data. The timezone support is not an excuse to store in other timezones. | * Only UTC should be used as the timezone for newly scraped data. The timezone support is not an excuse to store in other timezones. | ||
* The timezone support is only meant for compatibility purposes with prior Asagi data, given that they store time as US time ( | * The timezone support is only meant for compatibility purposes with prior Asagi data, given that they store time as US time (m | ||
Revision as of 07:22, 19 August 2019
How asagi does stuff
When asagi does a thread update:
in : YotsubaJSON.java, ln. 88:
`public Topic getThread(int threadNum, String lastMod) throws ContentGetException, ContentParseException, CfBicClearParseException {`
Loads thread JSON
Decodes JSON
For each post in the decoded thread JSON:
Check if resto value is zero, and if so create a new thread from that post, updating lastmodified time to the time from fetching the JSON. `t = this.makeThreadFromJson(pj);`
If resto is zero: Add the post to the current thread.
`t.addPost(this.makePostFromJson(pj));` (What if two posts were resto==0? We’d break!)
Fuuka Imageboard Archival Standard (Fuuka)
Note: Not to be confused with the FoolFuuka frontend, which uses the Asagi scraper and therefore its standard instead.
Asagi Imageboard Archival Standard (Asagi)
The Asagi Imageboard Archival Standard was developed by eksopl of Easymodo and the Foolz team under the direction of woxxy. It was developed to run the Foolz archiver, and has been the engine for the majority of archivers since the collapse of Archive.moe.
Three versions can be identified:
- Mark I (2009) - Produced for Foolz.us. Maybe in use by Nyafuu, was in use by Loveisover.
- Mark II (2015) - Produced for Archive.moe. Used by Fireden and arch.b4k.co.
- Mark III (2019) - The final reference standard codified by the Bibliotheca Anonoma, in preparation for the development of new drop-in replacements.
Reference Implementation
- Operating System: Ubuntu 14.04/16.04 LTS
- Database: MySQL Compatible utf8mb4
- Despite the fact that PostgreSQL is supported by Asagi Scraper, it is not supported by FoolFuuka.
- Scraper: Asagi (Java)
- Frontend: FoolFuuka (PHP)
- PHP Engine: Historically HHVM, PHP5.x compatible. Desuarchive and 4plebs now uses PHP 7.
- Search: Sphinxsearch
New Implementation
- Database: Percona MariaDB TokuDB - Special variant of MariaDB with many unique optimizations.
- Scraper: H (.NET C#)
- Frontend: bibanon/foolfuuka
- New Frontend: (ASP.NET C#)
Specifications
Files
first 4chan timestamp filename seen is recorded to the sql database with md5sum as unique key?
all future images with that md5sum are then linked to that timestamp filename
directory format does the following subfolders based on first few numbers to cut down on amount of files in a single directory (overloads filesystem)
1234/56/123456789000.jpg
Time
eastern time is used due to scraping
MySQL Schema
see asagi source code
foolfuuka also adds tables so dont forget when building
API Schema
Ayase Imageboard Archival Standard (Ayase)
The Ayase Imageboard Archival Standard was produced by the Bibliotheca Anonoma to handle the ever growing operations of Desuarchive and RebeccaBlackTech.
Reference Implementation
- Operating System: CentOS/RHEL 8
- Database: PostgreSQL
- Scraper: Ena or H (.NET C#)
- Middleware: Ayase (Python PyPy)
- Frontends: 4chan X, Clover, iphone app
Specifications
Files
- All files are to be named by shA256sum and file extension. This was chosen for the broad availability of hardware xtensions for the purpose nd its use by 8chan/vichan.
- They are to be stored in double nested folders.
Time
- Ayase requires time to be stored in PostgreSQL datetimes, which also store timezones.
- Only UTC should be used as the timezone for newly scraped data. The timezone support is not an excuse to store in other timezones.
- The timezone support is only meant for compatibility purposes with prior Asagi data, given that they store time as US time (m