Recently, I listened to Lex Fridman's podcast with Pieter Levels. Pieter discusses his success at rapidly deploying and iterating start-ups developed to solve specific problems. Pieter's success has motivated me to develop websites designed to help solve problems. I concur with Pieter's philosophy of developing the minimal viable product necessary to solve a problem, deploy the solution, and then refine it in response to community feedback. This cycle efficiently captures the necessary functionality the product should have. Moreover, it is robust against changing consumer demands as it is not overloaded with features.
In this spirit, I have, with the assistance of ChatGPT, created and deployed a couple of websites to tackle problems I have faced.
FindYourPhD
Throughout my journey of finding a PhD supervisor, I have scrolled endlessly through faculty pages to discover principal investigators with research interests that align with my own. Due to the specificity of my desired goals for a PhD, this was a long process. Naturally, I was thinking of ways to make this process more efficient. A relatively simple idea I had was to create a centralised location where supervisors could advertise PhD positions they were offering, and where students could quickly discover opportunities relevant to their interests. FindYourPhD is a platform where supervisors can upload their positions, and students can navigate a table of positions based on their interests. The website then offers individual supervisor pages where students can learn more about the advertised opportunities and understand how to apply.
The challenge I am having with this website is the lack of data it presents. I am experiencing a chicken and egg problem, as students will only be drawn to the platform if positions are advertised, however, supervisors will only be inclined to post their positions if there is an audience who will receive them.
I am not prepared to scrape web pages to populate this table for multiple reasons.
I wouldn't be able to obtain permission from supervisors to display their positions. It does not feel right to advertise positions without the consent of the supervisor.
The positions displayed on web pages are often not up to date. Therefore, the table of positions wouldn't be up-to-date, which is an issue I am trying to solve.
My areas of interest are narrow and do not reflect those of the broader student population. I wouldn't want to skew the scope of the platform.
MOAT
In any field, it can be challenging to stay up-to-date with the latest research. Even with a complete list of recently published papers, it can be difficult to decide which papers to read with the limited time availability you have. Utilising ideas of
relativism, I developed MOAT search which analyses recently published articles to identify pairs of articles with unique connections. The intention is that by exploring the papers in pairs, one can draw connections between the ideas and extract meaningful information that may not be present when reading papers individually.
MOAT is an acronym for Measure of all things, referencing the famous statement,
Man is the measure of things
made by the ancient Greek philosopher Protagoras. Protagoras was a Sophist who pioneered the idea of relativism.
I am fascinated by the idea of exploring the connections between topics. In the context of machine learning, I am exploring various ways of connecting neural network robustness and generalisation. More generally, there have been numerous instances when the connections between ideas have been incredibly fruitful, such as the Langlands program in mathematics. On the other hand, one of the greatest mysteries in modern-day physics is understanding the connection between quantum mechanics and general relativity.
It would be interesting to expand the scope of MOAT and create similar pairings for news articles. There are always two sides to every story, so digesting articles that take different perspectives on the same topic can help mitigate biases.
To identify the pairs to display I am using text embedding models. In particular, I am consulting the Massive Text Embedding Benchmark to identify the appropriate models for the task. My current limitation is on the size of these models and the size of their embedding spaces. As my approach to discovering the pairs requires multiple different models, this quickly leads to memory issues on the web hosting services. Therefore, I have had to optimize and restrict the application of the embedding models, meaning the results are not as optimal as they could be. However, even with this bottleneck, my qualitative experience of the results is positive.
Nice! I wonder if asking LLM to find connections between the papers would work (with good enough prompt to avoid some generic filler content).
Re: Pairing news articles -- https://ground.news does something similar, though it's not exactly what your'e describing.