About…three or so years ago, around October 2017 I was contacted on reddit about doing a small job. This job happened to be very NSFW related, the client wanted me to create a service that would log into the manyvids.com site and get information from their account and fetch it for their site.
The attempt to log into manyvids.com wasn’t successful because of the way it handles log ins. It basically submits the forms via ajax and by itself this wouldn’t be a problem. However it also submits an randomly generated captcha value that I assume is created on the server’s end. Without being able to tell what the seed value for this string is (because it’s obviously some sort of hash) I can’t dupe the session value and log in as a real user with their credentials (creds given to me by the client, of course.)
Since that was a wash, the client decided they wanted to scape the website for the top 50 mv girls at the time of the script running. This was at least doable. It’s worth noting that the client himself was a programmer so he attempted to do this himself before and didn’t have any luck grabing any values.
He was using simple_html_dom, a small php library for parsing html, in order to parse the html retrieved by pinging the site using file_get_html(). This is a function within the simple_html_dom library that converts the string content from the website into a travseable html dom. Before I go any more details let me just show you the code so you have a better idea of what I’m talking about.
Alright, so that’s a big boy there. About roughly 177 lines, could be smaller I’ll admit but it does the job. So bascially I hit up the MVGirls page of the site and look for a div called “#result-list div” (it probably changed by now) Within that div I do a search for every h4 that has a the classname “.profile-pic-name for the mvgirl’s name, afterwards I do a lookup for every a tag that has “.square-size-8”. This allows me to grable the profile link and the thumbnail associated with the mvgirl. I do this till I hit 50 girls and then break.
I clear out the dom afterwards to save on memory. There’s another portion of the script at the bottom that attempts to grab information on the mvgirls store but the client also wanted a gif or video screen grab on the whatever video was on the page at the time, this wasn’t possible with php. I attempted to use the casper.js framework to utilize a headless browser that could actually see the pages like a user would but because I didn’t have ffmpeg installed in the environment I couldn’t do such a thing. Honestly, for just 100 dollars this was become more of a chore than it was worth.
The last time I heard from the client they wanted to export the data to excel and perform some calculations on the data. At the time I was working on rewriting the back office area of AgentStudio.com from the ground up so I didn’t have time to consider this. But yeah, I did some scraping on a camgirl site before. Meh.