Posts

API from Scraping Problem: Too Slow! Let's try to cache it.

Hi, me again. And yes, it's about that MangaBat unofficial API. After using it for a while, I notice that scraping is too slow for a synchronous request. And imagine if I send the API request 100 times in a minute, I definitely gonna get banned from the source website. So I "cache" the scraped data with an expiration time of 3 hours. If I have to send the API request to RetrieveMangaDetail API or RetrieveChapterPages API in 3 hours, I will re-fetch it and cache it again. Of course, there's a requirement for creating an option in which the user can send a command to re-fetch it manually without having to wait for 3 hours. So, I modified my API to support "caching", by saving the data into the database via Django models. First, create a model to "cache" the manga detail class MangaModel(models.Model): url = models.URLField( primary_key = True ) result = models.JSONField( null = True, default = None ) lastFetchedAt = models.DateTimeField(

Creating MangaBat Unofficial API

Hi again. In the first post of this blog, I wrote about how I make a scraper for MangaBat. Today I will create an API based on that MangaBat scraper so I can actually use it on my front end. For the programming language, still Python. Now I also use Django with the Django Rest Framework library because of personal preference. I will create 5 APIs. List, Search, Get Manga, Get Chapter, and ProxyGet. While creating this API, I want to keep in mind that MangaBat is only one of the sources that I will extend the CrashMe Manga with, so I will try to keep the API as generic as possible. The setup is easy, just... create an API, copy the scraper's functionality, and it should be done. 1. API for getting the list of recently updated manga class ListLatestMangaAPI(APIView): def get ( self , request , source): if source.lower() == 'mangabat' : page = request.GET.get( 'page' , 1 ) base_url = 'https://h.mangabat.com/manga-list-all/'

Godseye, a Side Project That I Finally Started Just Today

So if you have talked with me long enough, and the topic of machine learning or AI or computer graphics has entered our conversation, you should already hear about my ideas of creating a human dictionary, with faces as identification. So I finally started it. Just today. After consulting several AI professionals (my boss at work, CTO, and VP of Infrastructure. What a privilege!), I started playing with AI models... at least pre-trained models for now. The idea is this: 1. Capture image from the webcam 2. Detect the faces 3. Lookup if the similar face is already saved in DB 3a. If already saved, return the label 3b. If not, create a new entry to DB with the label Unknown N where N is an incremental number. For the extension, I want it to capture images from a recording device in real time, then be able to put the label right on the camera frame. But I'll do it next time. So the technical steps I learned after a direct chat with ChatGPT is: 1. Load the image using cv2 2. Grayscale th

Scraping Mangabat

Image
Background: I buy an Ipad... yes, after about 10 years of using android, I now started to move to an Apple device. Here's the problem. I read manga (quite not-in-a-legal-way) and it's become a habit. Back when I use android, I use an app called 'Tachiyomi'. A manga-reader app for android that has a lot of extensions. The app itself does not supply the mangas, but each extension extends the app's capability to add manga sources, such as Mangabat. The problem is, 'Tachiyomi' only works on Android. It will never have an iPadOS version, because well... it's illegal. An alternative is I just need to read it at its sources, web version. But there are no cross-website tracking capabilities. To put it simply, I want 'Tachiyomi' on the web version.