Read this article in Русский.
Colin Hayhurst is the new CEO of the privacy-orientated search engine Mojeek. Colin has been a co-founding CTO, CCO, and CEO of startups in respectively high-performance computing, machine learning, and web-infrastructure. Two of those companies have been successfully bootstrapped whilst the other was part of the Y Combinator summer 2012 cohort.
Q: Mojeek is a search engine and not a search aggregator. What does that mean in practice and what exactly is involved in building a search engine from the ground up?
It means we are more a product-centric than a marketing-centric company. That changes a lot of things and makes us very different from metasearch engine companies such as DuckDuckGo, Ecosia, Startpage, and Qwant, to name a few.
We have always, and continue to, crawl the web updating our own link graph, database of web pages, and ranking algorithm. All the code has been written from the ground up in C and until fairly recently almost all of it by Marc Smith, our founder. It’s taken him 16 years to develop things thus far.
The only open source we use is cURL and more recently have been testing LMDB. Everything else including our search databases has been developed from scratch, so you can imagine the incredible job he has done.
What this means is that we can serve up search results that are independent of Google, Microsoft, and Yandex.
Q: What’s the technology you use?
We host and maintain our own bare metal servers in the UK’s greenest data center, Custodian. As mentioned we have our own bot, called MojeekBot, that crawls the web. The crawled pages we collect are then organized as a searchable database or index. With billions of pages indexed, our database technology has had to be written to cope with unusual and demanding cases.
Our web services use PHP. We don’t do any tracking and although we do use JavaScript on the front end, our site works with it disabled. That’s important for users who like to disable JavaScript for security reasons.
The operational tools we use are mostly self-hosted and include Nextcloud, FastMail, Zulip and Gitlab.
Q: What is the current size of your index? What is your goal?
Currently, our index is of 3.26 billion pages. We have a goal of reaching 5.7 billion by June 2021.
Q: What configuration options do you offer? How do you plan to expand them?
We are best known for our general search engine at mojeek.com. However, we also offer Site Search. Improvements made to that, for a recent project with a publisher, are now available to any organisation. We also offer an API, enabling developers to create their own search solutions.
With our recent funding and larger team, we are planning to expand what we offer, adding Maps and Business Listings, for instance. It’s a big short term priority for us to better understand what users need and want. My contribution to this will enable Marc, in particular, to focus more time on building and we will do better at optimizing our roadmap.
Q: As a privacy-focused search engine, you do not track your users. Since you cannot use data- collection to determine a user’s likely needs, how do you contend with the difficulty of giving accurate results to a query with ambiguous search terms?
Of course, when people use a search engine they are looking for information in various forms. So the search terms used nearly always have some latent intent.
The prediction of your needs and wants, using data collected about you, is in our view invasive and manipulative. It’s being used by companies, in Ad Tech and Big Tech, to improve not so much the service for you, but to optimize their advertising revenues.
In some cases, it can help to improve relevancy, but it can also come with consequences. The most notable of these are filter bubbles. We are steered away from facts and instead down the dangerous path of confirmation bias. Being directed to a variety of relevant content with different points of view is just as important, if not more so, than an algorithm making assumptions about what you are searching for.
This also touches on why we believe having an independent search index and algorithm is important. When almost every search engine retrieves their results from Google and Bing, then Google and Bing shape much of the information everybody gets to see.
Q: How can Mojeek users independently verify that their privacy is indeed respected?
This is a great question and the honest answer is they can’t. Ultimately it comes down to a matter of trust.
Even with code that is open source or when independent audits are conducted, verification of these matters is no guarantee. A company might be running a different version of its source code in practice and features could be switched on or off during an audit. Having said that we’re open to exploring ways that our users can be more reassured in this matter.
Mojeek is an official UK company with clear and short privacy policy. We define there exactly what very limited information we do log; for instance, we do not log IP addresses.
Q: Where do you see Mojeek in five years’ time?
Our aim is that Mojeek-powered services will provide a credible alternative to Google, Bing, and Yandex. These three are the only English Language search engines with indexes larger than ours. Our independent technical foundations mean we, and future partners, can offer a real alternative for search, beyond those coming from Chinese, Russian and US Big Tech.
By that time, we will have also built a sustainable business. That’s what I was brought in for last month and it feels like the timing was right.
* * *
Have you used Mojeek? What has been your experience? Let us know in the comments!
Photo by Nick Wessaert on Unsplash.